Uploaded by Jonay Acosta

2.Language Engineering (18)

advertisement
Language Engineering; Harnessing the Power of Language
1 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
Language Engineering
Harnessing the Power of Language
DE EN ES FR IT
Contents
Language Today
Language in Action
Language is Fundamental
Making Language Work for Us
Techniques and Resources
What is Language Engineering ?
Components of the Technology
Techniques
Speaker Identification and Verification
Speech Recognition
Character and Document Image Recognition
Natural Language Understanding
Natural Language Generation
Speech Generation
Language Resources
Lexicons
Specialist Lexicons
Grammars
Corpora
The Chain of Development and Application
The Impact of Language Engineering
Competing in a Global Market
Better Information
Direct Access to Services
Commerce in the Marketspace
Effective Communication
Accessibility and Participation
Improved Education Opportunities
Entertainment, Leisure, and Creativity
The Benefits
Glossary - Commonly used Terminology
Language Today
Language in Action
Language is the natural means of human communication; the most effective way we have to express
ourselves to each other. We use language in a host of different ways: to explain complex ideas and concepts;
to manage human resources; to negotiate; to persuade; to make our needs known; to express our feelings; to
narrate stories; to record our culture for future generations; and to create beauty in poetry and prose. For most
of us language is fundamental to all aspects of our lives.
The use of language is currently restricted. In the main, it is only used in direct communications between
human beings and not in our interactions with the systems, services and appliances which we use every day
04/10/2005 02:32 PM
Language Engineering; Harnessing the Power of Language
2 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
of our lives. Even between humans, understanding is usually limited to those groups who share a common
language. In this respect language can sometimes be seen as much a barrier to communication as an aid.
A change is taking place which will revolutionise our use of language and greatly enhance the value of
language in every aspect of communication. This change is the result of developments in Language
Engineering.
Language Engineering provides ways in which we can extend and improve our use of language to make it a
more effective tool. It is based on a vast amount of knowledge about language and the way it works, which
has been accumulated through research. It uses language resources, such as electronic dictionaries and
grammars, terminology banks and corpora, which have been developed over time. The research tells us what
we need to know about language and develops the techniques needed to understand and manipulate it. The
resources represent the knowledge base needed to recognise, validate, understand, and manipulate language
using the power of computers. By applying this knowledge of language we can develop new ways to help
solve problems across the political, social, and economic spectrum.
Language Engineering is a technology which uses our knowledge of language to enhance our application of
computer systems:
improving the way we interface with them
assimilating, analysing, selecting, using, and presenting information more effectively
providing human language generation and translation facilities.
New opportunities are becoming available to change the way we do many things, to make them easier and
more effective by exploiting our developing knowledge of language.
When, in addition to accepting typed input, a machine can recognise written natural language and speech, in a
variety of languages, we shall all have easier access to the benefits of a wide range of information and
communications services, as well as the facility to carry out business transactions remotely, over the
telephone or other telematics services.
When a machine understands human language, translates between different languages, and generates speech
as well as printed output, we shall have available an enormously powerful tool to help us in many areas of
our lives.
When a machine can help us quickly to understand each other better, this will enable us to co-operate and
collaborate more effectively both in business and in government.
The success of Language Engineering will be the achievement of all these possibilities. Already some of
these things can be done, although they need to be developed further. The pace of advance is accelerating and
we shall see many achievements over the next few years.
Language is Fundamental
Language is a means of effective, efficient communication. It is also a medium for recording and assimilating
information; in practice, the most convenient way of representing most of the information we need. Language
is vital both to our business activities and to our administration. It is also very important in many of the
social, cultural and political aspects of our lives. Language is integral to our culture. It helps each of us to
define ourselves.
For each one of us, our own language is fundamental to our national and cultural identity, providing a link to
our traditions as well as the foundation of our education and entertainment.
In Europe we have the benefit of a diversity of languages and cultures, which means that we have the
opportunity to learn a great deal about each others’ culture and way of life. This remains one of the bases for
a cohesive European society. If the benefits of a multi-lingual society are to remain a feature of the European
way of life then we must explore ways in which to overcome the barriers to communication and
understanding.
04/10/2005 02:32 PM
Language Engineering; Harnessing the Power of Language
3 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
It is sometimes said that it is possible to use only one or two languages for international activities in business,
administration and politics. To a certain extent this is true. However, it could never be entirely satisfactory.
The dominance of a few languages would be an unacceptable imbalance of power as well as a poor use of
resources.
Above all, it reduces significantly the number of people who can participate effectively in any activity and
this is bound to exclude valuable contributions and lead to discontent. In time, such an approach would also
marginalise the languages which are not used so widely, reducing further the scope of their usage and
inevitably diminishing the richness and variety of our culture. It would adversely affect not only our feeling
for national, regional and cultural identities, but also our sense of belonging to a truly European society, not
just tolerant of its minorities but supportive of them, recognising their value.
Such a restrictive approach to language use would also limit the availability of a wide range of important new
services and facilities by denying many people access to computer systems in their native language.
Europe’s position as a naturally multi-lingual community in a multi-lingual world can be used to our
commercial advantage. As we endeavour to collaborate more closely, to develop the single market as our
home market, we have a special incentive to develop solutions to the problems of a multi-lingual market
place. In successfully supporting our own language needs, especially in business, administration and
education, Language Engineering will help us to compete for business in the global marketplace. On the one
hand, our businesses will have a competitive edge through their experience in using technology to service the
needs of a multi-lingual marketplace. On the other hand, we shall also have language products to sell to the
rest of the world.
A pattern of life-long learning is expected to be one of the significant features of the Information Society. It is
also recognised that managers of the future will need to be capable in more than one language. Language
Engineering will make an important contribution to the development of personal tuition systems, not only for
language learning but also in developing systems which adapt more effectively to the needs of the student.
Language enabled products will improve the performance of business and administration as well as
individuals. Products which are developed using language technology will revolutionise our systems and
enhance the range of services available to business, government and the public at large.
Speech recognition, understanding, and generation by computer, will make human computer interaction more
efficient as well as more human. Natural language understanding by machines, will deliver our information
needs with more precision and sensitivity, helping us to overcome the problem of having too much
information to cope with.
Computer aided translation services and the generation of documents in foreign languages will not only
improve our dealings within Europe but will also help to give us greater access to external markets.
Making Language Work for Us
Our ability to develop our use of language holds the key to the multi-lingual information society; the
European society of the future. New developments in Language Engineering will enable us to:
access information efficiently, focusing precisely on the information we need, saving time and
avoiding information overload
talk to our computer systems, at home as well as at work, in our cars and in public places where we
need information or assistance
teach ourselves other languages and improve our use of our own, at our convenience: in our own time;
at our own pace; and in our own place
do business efficiently over the telephone by interacting reliably and directly with voice operated
computer systems; even instruct our PCs to carry out transactions on our behalf
learn more about what is happening around us, locally, nationally and internationally and have a
greater influence on decisions affecting our lives
operate more effectively internationally, in business, in administration, in political activities and as
citizens and consumers
04/10/2005 02:32 PM
Language Engineering; Harnessing the Power of Language
4 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
provide a wider range of better services to the maximum number of fellow citizens, colleagues and
customers.
Techniques and Resources
What is Language Engineering ?
Language Engineering is the application of knowledge of language to the development of computer systems
which can recognise, understand, interpret, and generate human language in all its forms. In practice,
Language Engineering comprises a set of techniques and language resources. The former are implemented in
computer software and the latter are a repository of knowledge which can be accessed by computer software.
Components of the Technology
The basic processes of Language Engineering are shown in the diagram below. These are broadly concerned
with:
entering material into the computer, using speech, printed text or handwriting, or text either keyed in or
introduced electronically
recognising the language of the material, distinguishing separate words, for example, recording it in
symbolic form and validating it
building an understanding of the meaning of the material, to the appropriate level for the particular
application
using this understanding in an application such as transformation (e.g. speech to text), information
retrieval, or human language translation
generating the medium for presenting the results of the application
finally, presenting the results to human users via a display of some kind: a printer or a plotter; a loud
speaker or the telephone.
04/10/2005 02:32 PM
Language Engineering; Harnessing the Power of Language
5 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
Model of a Language Enabled System
Within this general model there are, of course, many different configurations. Depending on the application
of the technology, not all these components are needed.
Techniques
There are many techniques used in Language Engineering and some of these are described below.
Speaker Identification and Verification
A human voice is as unique to an individual as a fingerprint. This makes it possible to identify a speaker and
to use this identification as the basis for verifying that the individual is entitled to access a service or a
resource. The types of problems which have to be overcome are, for example, recognising that the speech is
not recorded, selecting the voice through noise (either in the environment or the transfer medium), and
identifying reliably despite temporary changes (such as caused by illness).
Speech Recognition
The sound of speech is received by a computer in analogue wave forms which are analysed to identify the
units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used
to recognise discrete or continuous speech input. The production of quality statistical models requires
extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be
collected, for this purpose.
There are a number of significant problems to be overcome if speech is to become a commonly used medium
for dealing with a computer. The first of these is the ability to recognise continuous speech rather than speech
which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is
to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular
individual. There is also the serious problem of the noise which can interfere with recognition, either from the
environment in which the speaker uses the system or through noise introduced by the transmission medium,
the telephone line, for example. Noise reduction, signal enhancement and key word spotting can be used to
allow accurate and robust recognition in noisy environments or over telecommunication networks. Finally,
there is the problem of dealing with accents, dialects, and language spoken, as it often is, ungrammatically.
Character and Document Image Recognition
Recognition of written or printed language requires that a symbolic representation of the language is derived
from its spatial form of graphical marks. For most languages this means recognising and transforming
characters. There are two cases of character recognition:
recognition of printed images, referred to as Optical Character Recognition (OCR)
04/10/2005 02:32 PM
Language Engineering; Harnessing the Power of Language
6 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
recognising handwriting, usually known as Intelligent Character Recognition (ICR)
OCR from a single printed font family can achieve a very high degree of accuracy. Problems arise when the
font is unknown or very decorative, or when the quality of the print is poor. In these difficult cases, and in the
case of handwriting, good results can only be achieved by using ICR. This involves word recognition
techniques which use language models, such as lexicons or statistical information about word sequences.
Document image analysis is closely associated with character recognition but involves the analysis of the
document to determine firstly its make-up in terms of graphics, photographs, separating lines and text, and
then the structure of the text to identify headings, sub-headings, captions etc. in order to be able to process the
text effectively.
Natural Language Understanding
The understanding of language is obviously fundamental to many applications. However, perfect
understanding is not always a requirement. In fact, gaining a partial understanding is often a very useful
preliminary step in the process because it makes it possible to be intelligently selective about taking the depth
of understanding to further levels.
Shallow or partial analysis of texts is used to obtain a robust initial classification of unrestricted texts
efficiently. This initial analysis can then be used, for example, to focus on ’interesting’ parts of a text for a
deeper semantic analysis which determines the content of the text within a limited domain. It can also be
used, in conjunction with statistical and linguistic knowledge, to identify linguistic features of unknown
words automatically, which can then be added to the system’s knowledge.
Semantic models are used to represent the meaning of language in terms of concepts and relationships
between them. A semantic model can be used, for example, to map an information request to an underlying
meaning which is independent of the actual terminology or language in which the query was expressed. This
supports multi-lingual access to information without a need to be familiar with the actual terminology or
structuring used to index the information.
Combinations of analysis and generation with a semantic model allow texts to be translated. At the current
stage of development, applications where this can be achieved need be limited in vocabulary and concepts so
that adequate Language Engineering resources can be applied. Templates for document structure, as well as
common phrases with variable parts, can be used to aid generation of a high quality text.
Natural Language Generation
A semantic representation of a text can be used as the basis for generating language. An interpretation of
basic data or the underlying meaning of a sentence or phrase can be mapped into a surface string in a selected
fashion; either in a chosen language or according to stylistic specifications by a text planning system.
Speech Generation
Speech is generated from filled templates, by playing ’canned’ recordings or concatenating units of speech
(phonemes, words) together. Speech generated has to account for aspects such as intensity, duration and
stress in order to produce a continuous and natural response.
Dialogue can be established by combining speech recognition with simple generation, either from
concatenation of stored human speech components or synthesising speech using rules.
Providing a library of speech recognisers and generators, together with a graphical tool for structuring their
application, allows someone who is neither a speech expert nor a computer programmer to design a
structured dialogue which can be used, for example, in automated handling of telephone calls.
04/10/2005 02:32 PM
Language Engineering; Harnessing the Power of Language
7 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
Language Resources
Language resources are essential components of Language Engineering. They are one of the main ways of
representing the knowledge of language, which is used for the analytical work leading to recognition and
understanding.
The work of producing and maintaining language resources is a huge task. Resources are produced,
according to standard formats and protocols to enable access, in many EU languages, by research laboratories
and public institutions. Many of these resources are being made available through the European Language
Resources Association (ELRA).
Lexicons
A lexicon is a repository of words and knowledge about those words. This knowledge may include details of
the grammatical structure of each word (morphology), the sound structure (phonology), the meaning of the
word in different textual contexts, e.g. depending on the word or punctuation mark before or after it. A useful
lexicon may have hundreds of thousands of entries. Lexicons are needed for every language of application.
Specialist Lexicons
There are a number of special cases which are usually researched and produced separately from general
purpose lexicons:
Proper names: Dictionaries of proper names are essential to effective understanding of language, at least so
that they can be recognised within their context as places, objects, or person, or maybe animals. They take on
a special significance in many applications, however, where the name is key to the application such as in a
voice operated navigation system, a holiday reservations system, or railway timetable information system,
based on automated telephone call handling.
Terminology: In today’s complex technological environment there are a host of terminologies which need to
be recorded, structured and made available for language enhanced applications. Many of the most
cost-effective applications of Language Engineering, such as multi-lingual technical document management
and machine translation, depend on the availability of the appropriate terminology banks.
Wordnets: A wordnet describes the relationships between words; for example, synonyms, antonyms,
collective nouns, and so on. These can be invaluable in such applications as information retrieval, translator
workbenches and intelligent office automation facilities for authoring.
Grammars
A grammar describes the structure of a language at different levels: word (morphological grammar), phrase,
sentence, etc. A grammar can deal with structure both in terms of surface (syntax) and meaning (semantics
and discourse).
Corpora
A corpus is a body of language, either text or speech, which provides the basis for:
analysis of language to establish its characteristics
training a machine, usually to adapt its behaviour to particular circumstances
04/10/2005 02:32 PM
Language Engineering; Harnessing the Power of Language
8 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
verifying empirically a theory concerning language
a test set for a Language Engineering technique or application to establish how well it works in
practice.
There are national corpora of hundreds of millions of words but there are also corpora which are constructed
for particular purposes. For example, a corpus could comprise recordings of car drivers speaking to a
simulation of a control system, which recognises spoken commands, which is then used to help establish the
user requirements for a voice operated control system for the market.
The Chain of Development and Application
The diagram below depicts the chain of activities which are involved in Language Engineering, from research
to the delivery of language-enabled and language enhanced products and services to end-users. The process
of research and development leads to the development of techniques, the production of resources, and the
development of standards. These are the basic building blocks.
Model of Language Engineering Activities
In practice, Language Engineering is applied at two levels. At the first level there are a number of generic
04/10/2005 02:32 PM
Language Engineering; Harnessing the Power of Language
9 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
classes of application, such as:
language translation
information management (multi-lingual)
authoring (multi-lingual)
human/machine interface (multi-lingual voice and text)
At the second level, these enabling applications are applied to real world problems across the social and
economic spectrum. So, for example:
information management can be used in an information service, as the basis for analysing requests for
information and matching the request, against a database of text or images, to select the information
accurately
authoring tools are typically used in word processing systems but can also be used to generate text,
such as business letters in foreign languages, as well as in conjunction with information management,
to provide document management facilities
human language translation is currently used to provide translator workbenches and automatic
translation in limited domains
most applications can usefully be provided with natural language user interfaces, including speech, to
improve their usability.
In general, language capability is embedded in systems to enhance their performance. Language Engineering
is an ’enabling technology’.
The Impact of Language Engineering
Language technologies can be applied to a wide range of problems in business and administration to produce
better, more effective solutions. They can also be used in education, to help the disabled, and to bring new
services both to organisations and to consumers. There are a number of areas where the impact is significant:
competing in a global market
providing information for business, administration and consumers
offering services directly through tele-business
supporting electronic commerce
enabling effective communications
ensuring easier accessibility and participation
improving opportunities for education and self development
enhancing entertainment, leisure and creativity.
Competing in a Global Market
Business success increasingly depends on the ability to compete in a global marketplace. Success is based on
the ability to identify markets, sell into them effectively and provide the quality of aftersales service expected
by customers. There are many areas where the application of Language Engineering can lead to greater
efficiency and reduced costs. Such applications are:
generation of business letters and other commercial documentation in the appropriate language
production and management of multi-lingual customer documentation
provision of computer aided translation services
localisation of company procedures, staff handbooks, etc.
in-line translation of electronic communications
globalisation and localisation of computer systems and their user interfaces.
Better Information
04/10/2005 02:32 PM
Language Engineering; Harnessing the Power of Language
10 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
One of the key features of an information service is its ability to deliver information which meets the
immediate, real needs of its client in a focused way. It is not sufficient to provide information which is
broadly in the category requested, in such a way that the client must sift through it to extract what is useful.
Equally, if the way that the information is extracted leads to important omissions, then the results are at best
inadequate and at worst they could be seriously misleading.
Information is available throughout the world, on the World Wide Web, for example, in different languages.
In reality, however, it is only available to a client who can firstly request the information in the language in
which it is recorded and then understand the language in which the information is presented. Using machine
translation facilities the person seeking information will be able to complete an information request in his or
her native language and receive the information in that same language, regardless of the language in which
the information is recorded.
Language Engineering can improve the quality of information services by using techniques which not only
give more accurate results to search requests, but also increase greatly the possibility of finding all the
relevant information available. Use of techniques like concept searches, i.e. using a semantic analysis of the
search criteria and matching them against a semantic analysis of the database, give far better results than
simple keyword searches.
One of the major, direct benefits of the Information Society for the ordinary citizen will be the improvement
in public service information. However, the wide accessibility of this information will depend upon Language
Engineering. People who are not familiar with the conventional user interface of a computer system will be
able to request information by voice and the system will guide them through the possibilities. Those who
want information about other countries, which may be held in a foreign language, will be able to receive it in
their own language. A good example of this is a service which is currently being developed which will
provide information about job opportunities across the European Union in the native language of the potential
applicant. Obviously these are jobs where language skills are not significant. The service will be available on
the Internet and it is also planned to have public booths where job seekers can use the service. In a
mono-lingual pilot service run in Flanders, a surprising 26% of applications for jobs were received from
applicants who had seen the details on the Internet.
Language Engineering will make a contribution in a large number of public interest areas. Intelligence
gathering for law enforcement is an interesting case. In detecting smuggling for example, there is a large
amount of information available from public or commercial sources which, if collated and presented in the
right way, can give clear indications of suspicious activity. Details about ship movements, manifests and
company information can highlight abnormal profiles of activity. The ability of language based analysis to
produce these profiles is an important aid.
Direct Access to Services
In recent years there has been an explosion in the use of the telephone to deliver services such as banking,
arranging insurance cover, and providing help desk facilities. The advantage of this type of service to the
customer is that it provides a rapid response, ’around the clock’. For the supplier it is cost-effective because
the business does not have to be conducted from expensive retail premises. Using speaker identification and
speech recognition techniques it is possible to automate many of these services. A customer’s telephone call
can be dealt with by a computer system which is capable of having a meaningful dialogue with the caller and
delivering the service to the customer’s satisfaction. Perhaps the most obvious example today is the
automation of the telephone banking services which are already available from many banks. The customer,
telephoning the service would be answered by a computer which would, firstly, analyse the characteristics of
the customer’s voice to identify it and verify the customer’s rights of access to the service. Then a dialogue
would be conducted between the customer and the computer to establish the services required and to
complete any transactions needed, e.g. paying a bill, providing a statement and so forth. Other examples
could be ordering tickets for the theatre, making reservations for a journey by rail, ship, or aeroplane, and
home shopping via cable television.
Apart from the economic advantage of automating services to provide ’around the clock’ availability, it also
removes the need for people to work long and unsociable hours to provide the necessary coverage. Services
are likely to be more consistent, fast, and reliable. In addition the automatic recording of an audit trail for
04/10/2005 02:32 PM
Language Engineering; Harnessing the Power of Language
11 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
each transaction will mean that each party to the transaction can feel confident about its outcome.
Commerce in the Marketspace
Many of the actions involved in a business transaction, such as ordering, invoicing, and sending payment
instructions to the bank, can be completed without the need for human intervention using, for example, EDI
(Electronic Data Interchange) technology. However, at the present time, most business transactions are
initiated by a dialogue between humans either on the telephone, in writing, or face-to-face. With
improvements in the availability of telematics services and with the increasing use of the Internet and the
World Wide Web, opportunities to automate more activities in the commercial cycle (see illustration below)
have increased. Language enabled software will play a prominent role in making this automation easier to use
and more effective.
The Cycle of Commerce
To the human user one of the advantages of the World Wide Web is that information is published in natural
language. However, for a software agent to scan and select information from the Web, requires that it is given
the intelligence to understand the published information and match it to the requirements of its user.
Language Engineering can make a significant contribution to the development of intelligent agents which can
undertake to provide consumers with an easy way of using the facilities of electronic commerce. A consumer
could instruct such an agent, by voice, to browse the Web or any similar service, to read catalogues and select
suitable products, to look for and negotiate prices, even assemble bids in an electronic auction. When the
results have been reviewed the consumer would then tell the agent to place the order and, subsequent to
delivery, instruct the bank to pay an electronic invoice. The human users would see none of the complexity of
the underlying commercial transactions which would be dealt with by the agent.
After sales service can also be improved by using hypertext based electronic help desks with additional,
language enabled facilities. The benefits of this automation are immense. Apart from the reduction of costs
throughout the business transaction cycle, a wider choice of suppliers and products can be reviewed and
assessed for suitability, and competitive pricing will be stimulated. The whole process will be faster and more
efficient and, once the relevant information has been recorded, the accuracy of all the derivative processes
can be assured.
In time, electronic commerce will change the business model itself. There will be less need for middlemen.
New and small enterprises will be able to make the world aware of their products and services quickly,
effectively and without too much expense. However, without language understanding and multi-lingual
capability, these benefits cannot be fully realised.
Effective Communication
04/10/2005 02:32 PM
Language Engineering; Harnessing the Power of Language
12 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
Communication is probably the most obvious use of language. On the other hand, language is also the most
obvious barrier to communication. Across cultures and between nations, difficulties arise all the time not only
because of the problem of translating accurately from one language to another, but also because of the
cultural connotations of word and phrases. A typical example in the European context is the word ’federal’
which can mean a devolved form of government to someone who already lives in a federation, but to
someone living in a unitary sovereign state, it is likely to mean the imposition of another level of more
remote, centralised government.
As the application of language knowledge enables better support for translators, with electronic dictionaries,
thesauri, and other language resources, and eventually when high quality machine translation becomes a
reality, so the barriers will be lowered. Agreements at all levels, whether political or commercial, will be
better drafted more quickly in a variety of languages. International working will become more effective with
a far wider range of individuals able to contribute. An example of a project which is successfully helping to
improve communications in Europe is one which interconnects many of the police forces of northern Europe
using a limited, controlled language which can be automatically translated, in real-time. Such a facility not
only helps in preventing and detecting international crime, but also assists the emergency services to
communicate effectively during a major incident.
Accessibility and Participation
One of the most important ways in which Language Engineering will have a significant impact is in the use
of human language, especially speech, to interface with machines. This improves the usability of systems and
services. It will also help to ensure that services can be used not just by the computer literate but by ordinary
citizens without special training. This aspect of accessibility is fundamental to a democratic, open, and
equitable society in the Information Age.
A good example of the type of service which will be available is an automated legal advice service. The
accessibility of the justice system to all citizens is becoming a serious problem in many societies where the
cost of legal expertise and the process of law prevents all but the very rich, and those qualifying for legal aid,
from exercising their legal rights. It will be possible using language based techniques not only to provide
advice which is based on an understanding of the problem and an analysis of the relevant body of law, but
also to understand a natural language description of the problem and deliver the advice, as a human lawyer
would have done, in spoken or printed form. Such a service could be made available through kiosks in court
buildings or post offices, for example. This type of application can also be used to inform citizens of social
security entitlements and job opportunities, as well as providing a useable, comprehensible interface to more
open government.
Systems with the capacity to communicate with their users interactively, through human language, available
either through access points in public places or in the home, via the telephone network or TV cables, will
make it possible to change the nature of our democracy. There will be a potential for participation in the
decision-making process through a far greater availability of information in understandable and ’objective’
form and through opinion gathering on a very large scale. Many people whose lives are affected by disability
can be helped through the application of language technology. Computers with an understanding of language,
able to listen, see and speak, will offer new opportunities to access services at home and participate in the
workplace.
Improved Education Opportunities
Distance learning has become an important part of the provision of education services. It is especially
important to the concept of ’life-long learning’ which is expected to become an important feature of life in
the Information Age. The effectiveness of distance learning and self-study is improved by using telematics
services and computer aided learning.The quality and success of computer aided learning can be greatly
enhanced by the use of Language Engineering techniques. If the computer aided learning package can
understand the answers which its users give to questions, rather than simply recognise that the answer is right
or wrong, it can direct them down a path which is more appropriate to their needs. In this way, students are
likely to learn more effectively and have a longer concentration span, because a more sensitive package is
04/10/2005 02:32 PM
Language Engineering; Harnessing the Power of Language
13 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
inherently more comfortable to work with.
In future, in Europe, it will be essential in many walks of life to be competent in more than one language. Of
course, computer aided language learning (CALL) is an area of prime importance for the application of
Language Engineering. The same knowledge that is essential to the machine’s ability to understand, is also
the basis for the interactive teaching process, providing quality diagnostics of student errors as well as
illustrating correct usage. New, more effective learning facilities at home and at work will greatly increase
the opportunities to expand our knowledge and develop new skills.
Entertainment, Leisure and Creativity
The attraction of computer games to our children is a clear indication of the potential of the computer to
affect our culture. Home entertainment can become more educational, while education can become more
attractive, ’edutainment’ as it has become known. The possibility of tele-presence in virtual environments
such as museums, art galleries and libraries will provide a rich cultural experience, available to a wide section
of society in the comfort and convenience of their own homes. Virtual visits to such cultural archives will be
aided by language technology enabling the research and selection of all forms of digitised language based
records, indexing and retrieval of images, dubbing of films and automatic production of sub-titles and
providing translation of library and archive material.
For a wider range of people, writing can become a more exciting activity. Authoring tools will make it
possible for them to achieve much higher quality results. The use of on-line dictionaries and thesauri, for
example, makes selection of the ’mot juste’ more likely, and grammar can be checked. The result can be a far
more satisfying experience for writers who are not naturally gifted or well educated but who want to express
themselves effectively in their business or social correspondence.
The Benefits
The benefits to be gained from successful Language Engineering are immense. They include:
enhanced ability to compete in global markets
improved service from our public administration and public service agencies
wide accessibility of information through easier use of computer systems and Information Services
saving time by using intelligent computer systems as our agents
improvements in the quality of information recorded in information systems
better filtering of information when we need it
more effective international co-operation
improved safety through ’hands-free’ operation of equipment
greater security through voice verification techniques
reduced stress in ’hands-busy’ and ’eyes-busy’ situations
greater opportunities to integrate the disabled into everyday working and social activities
better communications with foreign business partners
greater availability of information about other countries’ goods and services, employment prospects,
weather and traffic conditions
more opportunities to educate ourselves at our convenience
greater cohesion within Europe, turning our natural interdependence into an easier, more rewarding,
working relationship.
Glossary - Commonly used Terminology
The following glossary describes some of the commonly used terminology of Language Engineering. Each
term is classified as being: [a] - acronym; [adj] - adjective; [n] - noun; [p] - phrase; [v] - verb.
04/10/2005 02:32 PM
Language Engineering; Harnessing the Power of Language
14 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
[n]
a short, concise description of a document, which covers the full scope of its
contents
[n]
a state whereby a word or sentence can be understood in different ways; the
former because the word has more than one meaning or the latter because the
structure of the sentence can be analysed in such a way as to convey more
than one meaning
authoring tools
[p]
facilities provided in conjunction with word processing to aid the author of
documents, typically including an on-line dictionary and thesaurus, spell-,
grammar-, and style-checking, and facilities for structuring, integrating and
linking documents
CALL
[a]
Computer Aided Language Learning
character
recognition
[p]
see Character and Document Recognition
computational
linguistics
[p]
an area of applied linguistics concerned with the processing of natural
language by computers
concept search
[p]
used in the context of information retrieval to mean that the search is made
using a semantic analysis of the search filter matched against a semantic
analysis of the database
continuous
speech
[p]
speech where the speaker makes no allowances for the listener (e.g. a speech
recognition device) by pausing between words
controlled
language
[p]
language which has been designed to restrict the number of words and the
structure of (also artificial language) language used, in order to make
language processing easier; typical users of controlled language work in an
area where precision of language and speed of response is critical, such as the
police and emergency services, aircraft pilots, air traffic control, etc.
corpus (plural
corpora)
[n]
see Corpora
dialogue
[n]
an interactive, two way alternate flow of language between two individuals,
an individual and a machine, or between two machines
dictionary
[n]
a list of words and a description of each, usually confined to describing their
meaning and possibly their etymology
discourse
[n]
a contiguous stretch of language comprising more than one sentence
discourse
analysis
[p]
analysis to identify the linguistic dependencies which exist between sentences
document image
recognition
[p]
see Character and Document Image Recognition
[n]
usually applied to the area of application of the language enabled software
e.g. banking, insurance, travel, etc.; the significance in Language Engineering
is that the vocabulary of an application is restricted so the language resource
requirements are effectively limited by limiting the domain of application
abstract
ambiguity
domain
04/10/2005 02:32 PM
Language Engineering; Harnessing the Power of Language
15 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
formalism
[n]
a means to represent the rules used in the establishment of a model of
linguistic knowledge
generate
[v]
to produce language in one form from another form of language or
information see also Speech Generation and Natural Language Generation
globalisation
[n]
the process of preparing software for use in any language and cultural
environment either by designing it to be usable in this way or by adding
facilities to existing software to facilitate subsequent localisation (see below)
grammar
[n]
see Grammars
grammar
checker
[p]
a software facility which checks text for the correctness of its grammar
hidden Markov
model
[p]
a finite state machine in which not only transitions are probabilistic but also
output; currently used in speech recognition systems to help to determine the
words represented by the sound wave forms captured
hypertext
[n]
a system commonly used for help files and in the World Wide Web whereby
highlighted text is used to provide a link (rather like an index) to related text
(often a more detailed explanation of the item highlighted)
index
[v]
to build a concise means of reference to information within a database which,
for textual information, can be based on keywords or concepts
information
extraction
[p]
the process of selecting information from a database using indices based on
keywords, semantics, and/or concept searching
information
retrieval
[p]
usually used as a generic term to cover the access to and delivery of
information from natural language databases by whatever method
interlingua
[n]
an invented language which can be used as a common, formal representation
into which source natural language may be translated and from which target
natural language can be generated
interpret
[v]
generally, to attribute meaning to language; but also, to translate from one
language to another, usually orally, in real-time
language
enabled
[p]
describes a computer application which has been improved in functionality,
performance, enhanced and/or presentation by the use of language
engineering
language
engineering
[p]
the application of knowledge of language to the development of computer
systems which can recognise, understand, interpret and generate human
language in all its forms
language
resources
[p]
see Language Resources
lemmatise
[v]
to break an inflected word into its root (base form) and ending components
lexicon
[n]
see Lexicons
04/10/2005 02:32 PM
Language Engineering; Harnessing the Power of Language
16 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
localise
[v]
to adapt software to the local requirements in terms of language and culture
(including legal practice and business conventions, for example)
machine
translation
[p]
the process of automatically translating from one language to another by a
computer
machine aided
translation
[p]
the process of assisting a human translator in translating from one language to
another using computer software tools
machine
readable
[p]
a dictionary (see above) which can be read by computer dictionary software
mark up
[v]
to annotate text so that its structure and presentation are defined in such a way
that the structure can be reproduced by a software system other than that used
for its creation
morpheme
[n]
the smallest meaningful element of language
morphology
[n]
the science of the structure of words
multi-lingual
[adj]
properly used to mean that something exists in a form that can handle several
languages but often used to describe the characteristic that versions exist in
several languages
natural language
generation
[p]
see Natural Language Generation
natural language
processing
[p]
a term in use since the 1980s to define a class of software systems which
handle text intelligently
OCR
[a]
see Optical Character Recognition below
Optical
Character
Recognition
[p]
see Character and Document Image Recognition
onomastics
[n]
scientific investigation of proper names (see Specialist Lexicons)
parse
[v]
analyse language in order to establish its structure and relationships at a the
level of syntax and/or semantics
phoneme
[n]
the smallest unit of sound (analogous to a morpheme) which can be identified
from an acoustic flow of speech and which is semantically distinct
proper names
[p]
see Specialist Lexicons
semantics
[n]
the analysis of language to determine meaning
[p]
software which parses language to a point where a rudimentary level of
understanding can be realised; this is often used in order to identify passages
of text which can then be analysed in further depth to fulfil the particular
objective
shallow parser
04/10/2005 02:32 PM
Language Engineering; Harnessing the Power of Language
17 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
speaker
identification
[p]
see Speaker Identification and Verification
speaker
independent
[p]
describes a speech recognition system which is capable of recognising speech
regardless of the speaker, i.e. it does not need to be trained to recognise
individual speakers
speaker
verification
[p]
see Speaker Identification and Verification
speech
recognition
[p]
see Speech Recognition
speech
generation
[p]
see Speech Generation
speech to text
[p]
the process of analysing speech and producing its textual equivalent; a typical
example of a speech to text application is in dictation systems
spell checker
[p]
software which checks the spelling of words
style check
[p]
software which checks a document to ensure that it conforms to a template
defining the structure of the text and the document containing it; also the
checking of the use of phrases or sentences in a predefined way
summarise
[v]
to produce a concise description of a document, which covers the full scope
of its contents
syllable
[n]
a unit of pronunciation which is more than a single sound (see phoneme
above) and smaller than a word
syntax
[n]
the system of rules which describe how sentences can be formed from basic
elements of language, i.e. morphemes, words and parts of speech
tag
[v]
to annotate a corpus by attaching information to the words, which describes
the grammatical context of the words and/or associations with other words
terminology
[n]
see Specialist Lexicons
text
[n]
used frequently to distinguish written, printed, or symbolically recorded
(using character encoding) language from speech
text alignment
[p]
the process of aligning different language versions of a text in order to be able
to identify equivalent terms, phrases, or expressions
text to speech
[p]
the process of producing the speech equivalent of text; a typical example of a
text to speech application is an automatic announcement system at an airport
or railway station
thesaurus
[n]
a dictionary of synonyms
translate
[v]
to transfer a text from one language to another
04/10/2005 02:32 PM
Language Engineering; Harnessing the Power of Language
18 of 18
http://www.hltcentral.org/usr_docs/Harness/harness-en.htm
translation
memory
[p]
a system which builds knowledge about translating from one language to
another by remembering and re-using previous translations
translator’s
workbench
[p]
a software system providing a working environment for a human translator,
which offers a range of aids such as on-line dictionaries, thesauri, translation
memories, etc
user modelling
[p]
usually, in dialogue based speech recognition, a component which attempts to
be sensitive to the various sorts of users that the system may encounter
utterance
[n]
the string of sounds produced by a speaker between two pauses
version
[n]
an edition of a document which is recorded as different from the previous
edition
version control
[p]
the management of the production, recording, and issue of documents
voice
authentication
[p]
speaker verification
voice
recognition
[p]
speech recognition
wizard of Oz
testing
[p]
testing in which the automated machine component is substituted by some
form of human intervention but in such a way that the user participating in the
test is unaware of the substitution
wordnet
[n]
see Specialist Lexicons
04/10/2005 02:32 PM
Download