computer applications of linguistic theory

advertisement
COMPUTER APPLICATIONS OF LINGUISTIC THEORY
1.1. FACT AND FICTION

Natural language vs. computer language, language is the basis of
communication and computer language is not just a different language.

The syntax of both can be modeled on the same types of formal systems.
But in natural language syntax is constrained by semantic ambiguities and
computer language is designed to avoid this ambiguity and sensitivity. So,
there’s no formal system capable of enduring a perfect semantic account
for both languages.

Warren Weaver in 1949 viewed translation as a code-breaking problem.
Sentences encoded a message, which is the meaning of the sentence. So
translation is decryption followed by encryption. But there is no key
that implies the mapping between the surface form of the sentence (the
encrypted form) and a statement of the message in a universal system of
concepts (its unencrypted form).

The range of computer natural language applications are now easily
available and used in personal home computers allowing PC users to
interact with relational databases leaning on natural language.
1.2. THE LEAP FROM LINGUISTIC THEORY TO PROGRAMS

Linguistic theory vs. computational applications.

Generative linguistics has aimed to characterize the linguistic knowledge
of an idealized speaker-hearer.

Computational linguistics tries to mediate between competence theory
and the particular type of linguistic performance attributable to machines
by turning linguistic theory into algorithms. That is to permit the
simulation of linguistic behavior while obeying the linguistic constrains
and generalizations implied in both linguistic theory and competence
grammars.

The machine does not encode the knowledge that underlies the behavior in
humans. Computer applications need a theoretical base, which can endure
the desired behavior.
1

Programs of the complexity of natural language understanding systems are
focused on linguistic technology, which is derived from linguistic theory.
The limit between theory and application is in computational linguistics.
1.3. COMPUTATIONAL LINGUISTICS

It is the branch of artificial intelligence (AI) which is concerned with the
investigation and modeling of a cognitive capacity.

The goal is to identify and characterize the classes of processes and the
types of knowledge, which are implied in the ability to communicate and
assimilate information using natural language.

There are two problems in embedding a linguistic knowledge in a computer
implementation:
1. Conversion of a competence grammar to a parser.
2. How multiple linguistic knowledge sources can be integrated in the
analysis process.

Computational linguistics relies on insights from a number of disciplines
within both computer science and linguistics. Its major areas of activity are
parsing, language generation, natural language understanding, machine
translation and speech.
1.4. PARSING

It is the recovery of structure from a signal
where the structure is not
apparent. A crucial step in any kind of natural language processing.

Grammars provide an explicit definition of string membership in al
language and of the association of strings with structures. They facilitate
the correction and expansion of the grammar itself and the development of
new parsing algorithms.

Research in the 50’s and 60’s resulted in the understanding of the
complexity of a language for the purposes of parsing and the form of the
grammar, which generates it. (Chomsky 1959, 1963). There are strings in
these languages, which cannot be analyzed by a computer regardless of
how much time and memory resources it has.

A way round the problem is to explicitly limit the set of sentences, which
are admitted to a system or a sublanguage. The introduction of strictly
defined sublanguage have improved machine translation though this is
2
only possible when the creation of the source language text can be
controlled.

A grammar based parser can be constructed for a sublanguage “parsefitting” techniques can be invoked when input which falls outside the
sublanguages is encountered.

A “rapprochement” between parsing technology and linguistic theory can
be achieved through developing parsers better suited to deal with complex
grammars. But parsing algorithm would have to be capable of reversing the
effects of long sentences of ordered applications of transformations.
PARSING SYSTEMS

The formalism that allowed this was the Augmented Transition Network
(ATN) grammar, which is a RTN grammar (Recursive Transition Networks)
with the addition of registers and conditions that allow transformations.
(Woods, 1970). But ATN grammars are formulated in such a way that
regularities encoded can be exploited in the parsing process.

An important development was the invention of the chart (Martin Kay). A
data structure accessed by the parser during analysis and the advent of
active chart parsing (Kaplang) which enhanced the possibility of exploring
the use of different scheduling techniques for ATN parsing.

From a computational point of view, LFG is a development of the ATN
theory. From a linguistic point of view, LFG are an extension of basegenerated syntax.

LFG satisfies the needs both of linguistic adequacy and ease of use in
parsing and psychological modeling. The actions and conditions are
replaced by a set of constraints that is a significant feature of several
current versions of syntactic theory.

A merger of linguistic theory and parser design was accomplished by
Marcus in his PARSIFAL system, whose point of departure was
transformational grammar. He explains a number of constraints and
conditions postulated in that theory on the basis of central properties of
the parser design.

These systems are written in some dialect of the programming language
LISP. But recently, a number of natural language systems have been
developed in PROLOG. A whole line of grammar formalisms (logic
grammars) has been developed to make it simpler to express linguistic
3
rules in a form that is appropriate for use with the PROLOG interpreter.
Grammars had so far little connection with ongoing research in linguistic
theory.

There are as many different types of parsing as there are different aspects
of linguistic structure.
1.5. TEXT-TO-SPEECH CONVERSION (speech synthesis)

The best examples of commercially available linguistic technology are
found in the area of speech synthesis and text-to-speech conversion.

Several
linguistic
theories
have
contributed
to
the
text-to-speech
conversion:
Speech synthesis: a theory of speech production.
Morphophonemic, phonological and prosodic rules.

The conversion has to be indirect: a text-to-speech device requires a
specification of the correspondences between letters and sounds.

Major advances are dependent on the availability of explicit linguistic rules
for allophonic variation and prosody and efficient implementations of these
in synthesis devices.
1.6. SPEECH RECOGNITION

It is the problem of identifying the segments, words or phrases in spoken
utterances. General-purpose patter-matching techniques have made it
possible to recognize an increasing number of isolated words. But
unknown techniques are necessary to recognize ordinary connected
speech.

Projects to recognize a limited vocabulary of isolated words are:

HARPY project connected speech with vocabulary from five different
speakers after training of those speakers.

HEARSAY and HWIM contributed to the study of different control
structures for natural language.

At present, a balanced picture has replaced the view that acoustic signal is
impoverished and that speech recognition is impossible without constant
support from semantics, syntax and pragmatics.
4
1.7. MACHINE TRANSLATION

It has enjoyed increasing attention and advances as progress in semantic,
syntactic and morphological processing. Also, more powerful computers
have made the dream of automatic translation possible.

The belief that translation was a problem of code-breaking was soon
abandoned as the properties of the linguistic code came to be more
recognized.

The parameters of an MT system is whether it makes use of an interlingua
and whether the translation is:

Indirect: the analysis of the target language is not influenced by what
source language was used.

Direct: translations use a transfer module which maps between abstract
source and target language representations.

EUROTRA is a system which is geared to handle several language pairs,
has opted for a modular design where the statement of linguistic
generalizations is separated from the specification of the parsing and
generation of algorithms.
1.8. CONCLUSION

String manipulation now includes ambitious attempts at simulation of
complex linguistic behavior. Linguistic science is being reshaped by the
growing understanding of cognitive processes, which flows from joint
together artificial intelligence, psychology and linguistics.

Computer applications of linguistic theory will also improve in quality and
increase in number.
5
Download