COMPUTER APPLICATIONS OF LINGUISTIC THEORY 1.1. FACT AND FICTION Natural language vs. computer language, language is the basis of communication and computer language is not just a different language. The syntax of both can be modeled on the same types of formal systems. But in natural language syntax is constrained by semantic ambiguities and computer language is designed to avoid this ambiguity and sensitivity. So, there’s no formal system capable of enduring a perfect semantic account for both languages. Warren Weaver in 1949 viewed translation as a code-breaking problem. Sentences encoded a message, which is the meaning of the sentence. So translation is decryption followed by encryption. But there is no key that implies the mapping between the surface form of the sentence (the encrypted form) and a statement of the message in a universal system of concepts (its unencrypted form). The range of computer natural language applications are now easily available and used in personal home computers allowing PC users to interact with relational databases leaning on natural language. 1.2. THE LEAP FROM LINGUISTIC THEORY TO PROGRAMS Linguistic theory vs. computational applications. Generative linguistics has aimed to characterize the linguistic knowledge of an idealized speaker-hearer. Computational linguistics tries to mediate between competence theory and the particular type of linguistic performance attributable to machines by turning linguistic theory into algorithms. That is to permit the simulation of linguistic behavior while obeying the linguistic constrains and generalizations implied in both linguistic theory and competence grammars. The machine does not encode the knowledge that underlies the behavior in humans. Computer applications need a theoretical base, which can endure the desired behavior. 1 Programs of the complexity of natural language understanding systems are focused on linguistic technology, which is derived from linguistic theory. The limit between theory and application is in computational linguistics. 1.3. COMPUTATIONAL LINGUISTICS It is the branch of artificial intelligence (AI) which is concerned with the investigation and modeling of a cognitive capacity. The goal is to identify and characterize the classes of processes and the types of knowledge, which are implied in the ability to communicate and assimilate information using natural language. There are two problems in embedding a linguistic knowledge in a computer implementation: 1. Conversion of a competence grammar to a parser. 2. How multiple linguistic knowledge sources can be integrated in the analysis process. Computational linguistics relies on insights from a number of disciplines within both computer science and linguistics. Its major areas of activity are parsing, language generation, natural language understanding, machine translation and speech. 1.4. PARSING It is the recovery of structure from a signal where the structure is not apparent. A crucial step in any kind of natural language processing. Grammars provide an explicit definition of string membership in al language and of the association of strings with structures. They facilitate the correction and expansion of the grammar itself and the development of new parsing algorithms. Research in the 50’s and 60’s resulted in the understanding of the complexity of a language for the purposes of parsing and the form of the grammar, which generates it. (Chomsky 1959, 1963). There are strings in these languages, which cannot be analyzed by a computer regardless of how much time and memory resources it has. A way round the problem is to explicitly limit the set of sentences, which are admitted to a system or a sublanguage. The introduction of strictly defined sublanguage have improved machine translation though this is 2 only possible when the creation of the source language text can be controlled. A grammar based parser can be constructed for a sublanguage “parsefitting” techniques can be invoked when input which falls outside the sublanguages is encountered. A “rapprochement” between parsing technology and linguistic theory can be achieved through developing parsers better suited to deal with complex grammars. But parsing algorithm would have to be capable of reversing the effects of long sentences of ordered applications of transformations. PARSING SYSTEMS The formalism that allowed this was the Augmented Transition Network (ATN) grammar, which is a RTN grammar (Recursive Transition Networks) with the addition of registers and conditions that allow transformations. (Woods, 1970). But ATN grammars are formulated in such a way that regularities encoded can be exploited in the parsing process. An important development was the invention of the chart (Martin Kay). A data structure accessed by the parser during analysis and the advent of active chart parsing (Kaplang) which enhanced the possibility of exploring the use of different scheduling techniques for ATN parsing. From a computational point of view, LFG is a development of the ATN theory. From a linguistic point of view, LFG are an extension of basegenerated syntax. LFG satisfies the needs both of linguistic adequacy and ease of use in parsing and psychological modeling. The actions and conditions are replaced by a set of constraints that is a significant feature of several current versions of syntactic theory. A merger of linguistic theory and parser design was accomplished by Marcus in his PARSIFAL system, whose point of departure was transformational grammar. He explains a number of constraints and conditions postulated in that theory on the basis of central properties of the parser design. These systems are written in some dialect of the programming language LISP. But recently, a number of natural language systems have been developed in PROLOG. A whole line of grammar formalisms (logic grammars) has been developed to make it simpler to express linguistic 3 rules in a form that is appropriate for use with the PROLOG interpreter. Grammars had so far little connection with ongoing research in linguistic theory. There are as many different types of parsing as there are different aspects of linguistic structure. 1.5. TEXT-TO-SPEECH CONVERSION (speech synthesis) The best examples of commercially available linguistic technology are found in the area of speech synthesis and text-to-speech conversion. Several linguistic theories have contributed to the text-to-speech conversion: Speech synthesis: a theory of speech production. Morphophonemic, phonological and prosodic rules. The conversion has to be indirect: a text-to-speech device requires a specification of the correspondences between letters and sounds. Major advances are dependent on the availability of explicit linguistic rules for allophonic variation and prosody and efficient implementations of these in synthesis devices. 1.6. SPEECH RECOGNITION It is the problem of identifying the segments, words or phrases in spoken utterances. General-purpose patter-matching techniques have made it possible to recognize an increasing number of isolated words. But unknown techniques are necessary to recognize ordinary connected speech. Projects to recognize a limited vocabulary of isolated words are: HARPY project connected speech with vocabulary from five different speakers after training of those speakers. HEARSAY and HWIM contributed to the study of different control structures for natural language. At present, a balanced picture has replaced the view that acoustic signal is impoverished and that speech recognition is impossible without constant support from semantics, syntax and pragmatics. 4 1.7. MACHINE TRANSLATION It has enjoyed increasing attention and advances as progress in semantic, syntactic and morphological processing. Also, more powerful computers have made the dream of automatic translation possible. The belief that translation was a problem of code-breaking was soon abandoned as the properties of the linguistic code came to be more recognized. The parameters of an MT system is whether it makes use of an interlingua and whether the translation is: Indirect: the analysis of the target language is not influenced by what source language was used. Direct: translations use a transfer module which maps between abstract source and target language representations. EUROTRA is a system which is geared to handle several language pairs, has opted for a modular design where the statement of linguistic generalizations is separated from the specification of the parsing and generation of algorithms. 1.8. CONCLUSION String manipulation now includes ambitious attempts at simulation of complex linguistic behavior. Linguistic science is being reshaped by the growing understanding of cognitive processes, which flows from joint together artificial intelligence, psychology and linguistics. Computer applications of linguistic theory will also improve in quality and increase in number. 5