corpus

As the main carrier of semantic information, a word is the main element of the utterance. Modern theoretical and applied research, from logical to morphological, pays great attention to it because no other linguistic unit has such unity of form and content and plays such an important role, as a word. Language and especially its vocabulary is constantly evolving. Words take on new meanings, old meanings disappear. In most cases, new words reflect the emerging new concepts of science, technology, life, social relations, politics and economics. Present linguistic situation with the computerization of society, and the so-called "information explosion" which lead to sharp expansion of channels of communication forces to pay special attention to shifts in lexical meanings. Modern methods and techniques of research produce new tools which were unknown to linguists of the past and which help us verify hypothesis and the results of studies. In modern linguistics the tool for research is not only a dictionary as the registrar of the word in its paradigmatics and syntagmatics but also a concordance, based on a representative sample of texts. Despite the fact that changes in the meaning of only one word constitute a problem, a collection of words belonging to a specific domain can be represented in the form of a system, in which meanings of words are connected in a certain way. In general, applied aspects of linguistics which support various spheres of human activity concentrate primarily on a general problem – the problem of processing information in society. This information means not only written texts but also oral speech – the most usual method of communication. New informational technologies enable us to study language from various sources – dictionaries, books of fiction, newspapers, etc. and to introduce and process large amounts of texts with the help of computers – text corpora. Computer processing and special programs are extremely important for lexicology and lexicography because they make work of compiling dictionaries, writing textbooks, carrying out literary research of ancient and modern authors much easier and productive. Semantics and syntax studies also undergone serious changes caused by new possibilities. Corpus linguistics (which originated from applied linguistics) is based on the corpus – a large amount of living language material which can be extracted from various sources and introduced into a computer. It studies speech and language from a different angle revealing a huge vocabulary for research and new powerful tools for scientists. At present corpus linguistics which studies distribution of linguistic phenomena in different languages and gives the possibility to obtain new and objective linguistic data becomes very important. The advantages of this trend are that it avoids the subjectivity typical for traditional linguistics and is based on objective information. Some basic features of Corpus Linguistics became known long time ago, for example, distributive methodology, creation of concordances, etc. However, as an independent linguistic trend it was formed relatively recently. According to P. Baker “in linguistics a corpus is a collection of texts (a ‘body’ of language) stored in an electronic database” (Baker 2006). The corpus (text corpus) of any language is a collection of the texts on given language which is represented in an electronic form and provided with “apparatus criticus” (scientific definitions, literature cited, references). This “apparatus criticus” built in the corpus is called “markup” (lay-out), or “annotation” of the corpus. If “markup” is correct it is easy to find any word, phrase, grammatical structure in the corpus which are necessary for a language analysis. The text corpus is used to do statistical analysis, to check occurrences of linguistic rules, to determine the usage of a special sound, word or syntactic construction, to research phraseology and to gain an overview of the word in its linguistic environment. Any corpus helps to examine how people use any word (lemma) in a spoken and written language. Corpus research also promotes revealing a wide spectrum of semantics of multiple-valued words in a wide context, encourages the progress of word identification in a particular act of communication. Any investigation of the text corpus gives substantial not only for compiling a dictionary but also for differentiation of different variants of language. The following types of corpora can be defined: 1) Annotated Corpus; 2) Comparable (reference) Corpus; 3) Monitor Corpus; 4) Monolingual Corpus; 5) Multilingual Corpus; 6) Parallel (aligned) Corpus; 7) Reference Corpus; 8) Spoken Corpus; 9) Unannotated Corpus; 10) Speech Corpus. The British National Corpus (BNC) is one of the first corpora created in the world by specialists of lexicography. The capacity of the BNC is more than one hundred million word usage. The corpus also includes metatext and parts of speech “markup”, a subcorpus of oral speech. Mostly of the corpus incorporates different types of written and spoken British English. The BNC also contains a great amount of speech phenomena. Corpus linguistics gives material for different studies of language and its variants, and defines the basic method for the analysis of the texts on the basis of corpora (Corpus-Based Approach). This approach, or the method of linguistic research based on the text corpora, is focused on applied study of language, its functioning in real environment what is important for language teaching. For example, the lexicographic analysis on the basis of corpora obviously helps to open the contextual use of the words, especially synonymous (for example, small/little, big/large), their frequency compatibility with other words, a regularity in different styles, and to define their semantics accurately. The corpus-based approach provides with an opportunity to observe the behavior of words, phrases, grammatical categories, syntactic constructions, etc. in natural language environment that is in real rather than artificially constructed contexts. In addition, corpus studies allow (applying statistical methods) to formulate, prove or disprove a hypothesis about a particular linguistic phenomenon which is based on a large amount of material. Moreover, if the researchers use the existing corpus, they completely bypass a long and time-consuming phase of collecting material (survey informants, work with the dictionary card files or written texts, etc.).

corpus

Related documents

Products

Support

corpus

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib