COURSE DESCRIPTION Course code LKK5002 Course group Volume in ECTS credits Course valid from Course valid to C 6 2013 06 11 2016 06 11 Course type (compulsory or optional) Course level (study cycle) Semester the course is delivered Study form (face-to-face or distant) Course title in Lithuanian Reg. No. Compulsory Master First semester Face-to-face TEKSTYNŲ LINGVISTIKOS TEORIJA IR METODOLOGIJA Course title in English CORPUS LINGUISTICS. THEORY AND METHODOLOGY Short course annotation in Lithuanian Kurso tikslas – supažinti studentus su tekstynų sudarymo principais, jų įvairove ir jų taikymu lingvistikos, leksikografijos bei informacinių technologijų reikmėms. Jo metu gaunami tekstynų lingvistikos teorijos pagrindai, būtini tolesniam praktiniam kalbos analizės ir žodynų sudarymo darbui. Studentai išmokomi dirbti su kalbos vienetų paieškos tekstyne programine įranga, jie įpranta sudaryti, analizuoti ir klasifikuoti konkordansus, jais remiantis rengti leksikografinius bei kitokius kalbos vienetų aprašus. Short course annotation in English The course is meant to acquaint students with the basics of corpus linguistics. It starts with the descriptions and analysis of the first corpora and the rules of corpus design. The variety of corpora is dealt with from the point of view of the languages, size and annotation (general vs. special, parallel vs. comparable, annotated vs. raw corpora). Students are taught to work with the available software tools for corpus analysis such as concordances and statistical measures and to apply them to various linguistic units under investigation. Prerequisites for entering the course Introduction to linguistics, morphology, lexicology, syntax Course aim To present the basics of corpus compilation, their types and application for the needs of linguistics, lexicography and information technologies Links between study programme outcomes, course outcomes and criteria of learning achievement evaluation Study programme Criteria of learning Course outcomes outcomes achievement evaluation 1.2. Ability to see language as a whole comprising all its levels 1.4. Ability to use the methods of other disciplines and their research 1.5. Ability to apply linguistic research to other disciplinary research: to history, ethnology, psychology, sociology, computer science, etc. 3.1. Acquaintance with the new IT tools and systems for natural language processing 3.2. Ability to use the new IT tools for language processing, archiving, preservation, annotation and information extraction 3.4. Ability to extract, analyse and evaluate linguistic data: wordlists, concordances, collocations, to comprise 1. Ability to see language as a whole comprised of lexis, semantics and grammar 2. To master linguistic and other disciplinary methods and their complimentary use 3. To get acquainted and to be able to use the IT tools for natural language processing, archiving, preservation, annotation and information extraction 4. To compile, analyse, and critically evaluate linguistic data necessary for research Analyses of the usage of lexical units in their linguistic and sociocultural context Application of general deductive and inductive methods, adapted to the solution of the problems of corpus linguistics specifically and other of the social sciences and humanities in general Ability to use automatic tools for text and corpus analyses, extraction of linguistic information, to interpret and evaluate it Collection and processing of the data important for the solution of a chosen problem lexicographic entries of lexical items Lexicographic analyses of lexical units, embracing their grammatical patterns and semantic systems 6.1. Ability to analyse and describe the meaning of linguistic units 5. To analyse and describe the meaning of linguistic units based on their lexical, semantic, and grammatical features 8.1. Ability to communicate and cooperate with the researchers from other fields 6. To be aware of the overall problematics in linguistics and to be able to apply interdisciplinary approach to linguistics research Link between course outcomes and content Course outcomes 1. Ability to see language as a whole comprised of lexis semantics and grammar 2. Knowledge of the main methods of linguistics as well of other disciplines, ability to apply linguistic methods to other disciplines and vice versa 3. Ability to use IT for language and text processing 4. Ability to get, analyse and evaluate research data 5. Ability to analyse meaning of linguistic units with regard to their lexical, semantic and grammatical features 6. Knowledge of the problems of linguistic research, ability to solve them with the help of interdisciplinary approach Formalization of a linguistics problem, statistical analyses of data Content (topics) The sources of corpus linguistics and its distinctive features as well as its specific placement in between linguistic theory and methodology. The concept of a corpus as the main resource for linguistic data and its main features. Types of corpora, their representativeness, general and specific nature, types of annotation, variety of languages and sublanguages as the foundation of different types of corpora. Text encoding and annotation. Annotated corpora, annotation at morphological, syntactic, semantic and pragmatic levels. Multilingual corpora: parallel and comparable, the objectives and methods of their analyses. Parallel corpora in the service of the theory and practice of translation, and multilingual lexicography. Frequency lists, heir application and analyses. Keywords, their statistical and cultural analyses. Descriptors as a tool for automatic indexing of texts. The application of self-made specialised corpora, their frequency and keyword lists for the investigation of linguistic problems, as well as concordances of selected lexical units. Study (teaching and learning) methods Teaching methods: lecturing, explanation, consultation, visualisation, problem solution, case studies, feedback to students progress. Learning methods: literature analyses, exercises, group discussion, joint projects. Methods of learning achievement assessment Test, presentation of individual investigations (homework tasks) and joint projects Distribution of workload for students (contact and independent work hours) Lectures 30 hours Seminars 15 hours Group work 15 hours Individual students work 102 hours Total: 162 hours Structure of cumulative score and value of its constituent parts Mid-term exam – 30 %, home work – 20 %, examination – 50 % of a final score. Recommended reference materials No. Publication year Authors of publication and title Publishing house University library Number of copies in Self-study Other rooms libraries Basic materials 1. 2010 2. 2000 3. 2010 Marcinkevičienė R. Lietuvių kalbos kolokacijos. Marcinkevičienė R. Tekstynų lingvistika. Teorija ir praktika. // Darbai ir dienos 24, 7–63. The Routledge Handbook of Corpus Linguistics. VDU leidykla VDU leidykla Routledge 1 10 1 4. 2002 5. 2005 Biber D., Conrad S., Reppen R. Corpus Linguistics. Stubbs M. Words and Phrases. Corpus Studies of Lexical Semantics. Cambridge university Press Blackwell Publishing 1 1 Course programme designed by Prof. Rūta Petrauskaitė, assoc. prof. Andrius Utka, Department of the Lithuanian Language