Corpus Linguistics. Theory and Methodology

advertisement
COURSE DESCRIPTION
Course
code
LKK5002
Course group
Volume in
ECTS credits
Course valid
from
Course valid to
C
6
2013 06 11
2016 06 11
Course type (compulsory or optional)
Course level (study cycle)
Semester the course is delivered
Study form (face-to-face or distant)
Course title in Lithuanian
Reg. No.
Compulsory
Master
First semester
Face-to-face
TEKSTYNŲ LINGVISTIKOS TEORIJA IR METODOLOGIJA
Course title in English
CORPUS LINGUISTICS. THEORY AND METHODOLOGY
Short course annotation in Lithuanian
Kurso tikslas – supažinti studentus su tekstynų sudarymo principais, jų įvairove ir jų taikymu lingvistikos,
leksikografijos bei informacinių technologijų reikmėms. Jo metu gaunami tekstynų lingvistikos teorijos
pagrindai, būtini tolesniam praktiniam kalbos analizės ir žodynų sudarymo darbui. Studentai išmokomi dirbti
su kalbos vienetų paieškos tekstyne programine įranga, jie įpranta sudaryti, analizuoti ir klasifikuoti
konkordansus, jais remiantis rengti leksikografinius bei kitokius kalbos vienetų aprašus.
Short course annotation in English
The course is meant to acquaint students with the basics of corpus linguistics. It starts with the descriptions
and analysis of the first corpora and the rules of corpus design. The variety of corpora is dealt with from the
point of view of the languages, size and annotation (general vs. special, parallel vs. comparable, annotated
vs. raw corpora). Students are taught to work with the available software tools for corpus analysis such as
concordances and statistical measures and to apply them to various linguistic units under investigation.
Prerequisites for entering the course
Introduction to linguistics, morphology, lexicology, syntax
Course aim
To present the basics of corpus compilation, their types and application for the needs of linguistics,
lexicography and information technologies
Links between study programme outcomes, course outcomes and criteria of learning achievement
evaluation
Study programme
Criteria of learning
Course outcomes
outcomes
achievement evaluation
1.2. Ability to see language as
a whole comprising all its
levels
1.4. Ability to use the methods
of other disciplines and their
research
1.5. Ability to apply linguistic
research to other disciplinary
research: to history, ethnology,
psychology, sociology,
computer science, etc.
3.1. Acquaintance with the
new IT tools and systems for
natural language processing
3.2. Ability to use the new IT
tools for language processing,
archiving, preservation,
annotation and information
extraction
3.4. Ability to extract, analyse
and evaluate linguistic data:
wordlists, concordances,
collocations, to comprise
1. Ability to see language as a whole
comprised of lexis, semantics and grammar
2. To master linguistic and other disciplinary
methods and their complimentary use
3. To get acquainted and to be able to use the
IT tools for natural language processing,
archiving, preservation, annotation and
information extraction
4. To compile, analyse, and critically evaluate
linguistic data necessary for research
Analyses of the usage of lexical
units in their linguistic and
sociocultural context
Application of general deductive
and inductive methods, adapted to
the solution of the problems of
corpus linguistics specifically and
other of the social sciences and
humanities in general
Ability to use automatic tools for
text and corpus analyses, extraction
of linguistic information, to
interpret and evaluate it
Collection and processing of the
data important for the solution of a
chosen problem
lexicographic entries of lexical
items
Lexicographic analyses of lexical
units, embracing their grammatical
patterns and semantic systems
6.1. Ability to analyse and
describe the meaning of
linguistic units
5. To analyse and describe the meaning of
linguistic units based on their lexical,
semantic, and grammatical features
8.1. Ability to communicate
and cooperate with the
researchers from other fields
6. To be aware of the overall problematics in
linguistics and to be able to apply
interdisciplinary approach to linguistics
research
Link between course outcomes and content
Course outcomes
1. Ability to see language as a
whole comprised of lexis
semantics and grammar
2. Knowledge of the main
methods of linguistics as well
of other disciplines, ability to
apply linguistic methods to
other disciplines and vice versa
3. Ability to use IT for language
and text processing
4. Ability to get, analyse and
evaluate research data
5. Ability to analyse meaning of
linguistic units with regard to
their lexical, semantic and
grammatical features
6. Knowledge of the problems
of linguistic research, ability to
solve them with the help of
interdisciplinary approach
Formalization of a linguistics
problem, statistical analyses of data
Content (topics)
The sources of corpus linguistics and its distinctive features as well as its specific
placement in between linguistic theory and methodology. The concept of a corpus
as the main resource for linguistic data and its main features.
Types of corpora, their representativeness, general and specific nature, types of
annotation, variety of languages and sublanguages as the foundation of different
types of corpora.
Text encoding and annotation. Annotated corpora, annotation at morphological,
syntactic, semantic and pragmatic levels.
Multilingual corpora: parallel and comparable, the objectives and methods of their
analyses. Parallel corpora in the service of the theory and practice of translation, and
multilingual lexicography.
Frequency lists, heir application and analyses. Keywords, their statistical and
cultural analyses. Descriptors as a tool for automatic indexing of texts.
The application of self-made specialised corpora, their frequency and keyword lists
for the investigation of linguistic problems, as well as concordances of selected
lexical units.
Study (teaching and learning) methods
Teaching methods: lecturing, explanation, consultation, visualisation, problem solution, case studies,
feedback to students progress.
Learning methods: literature analyses, exercises, group discussion, joint projects.
Methods of learning achievement assessment
Test, presentation of individual investigations (homework tasks) and joint projects
Distribution of workload for students (contact and independent work hours)
Lectures
30 hours
Seminars
15 hours
Group work
15 hours
Individual students work 102 hours
Total: 162 hours
Structure of cumulative score and value of its constituent parts
Mid-term exam – 30 %, home work – 20 %, examination – 50 % of a final score.
Recommended reference materials
No.
Publication
year
Authors of publication and
title
Publishing
house
University
library
Number of copies in
Self-study
Other
rooms
libraries
Basic materials
1.
2010
2.
2000
3.
2010
Marcinkevičienė R. Lietuvių
kalbos kolokacijos.
Marcinkevičienė R.
Tekstynų lingvistika. Teorija
ir praktika. // Darbai ir
dienos 24, 7–63.
The Routledge Handbook of
Corpus Linguistics.
VDU leidykla
VDU leidykla
Routledge
1
10
1
4.
2002
5.
2005
Biber D., Conrad S., Reppen
R. Corpus Linguistics.
Stubbs M. Words and
Phrases. Corpus Studies of
Lexical Semantics.
Cambridge
university Press
Blackwell
Publishing
1
1
Course programme designed by
Prof. Rūta Petrauskaitė, assoc. prof. Andrius Utka, Department of the Lithuanian Language
Download