Prezentace aplikace PowerPoint

advertisement
System of Pronominal Words in Czech
with Respect to German and English
Magda Razímová
Institute of Formal and Applied Linguistics
Charles University
Prague, Czech Republic
razimova@ufal.mff.cuni.cz
Outline of the talk

Introduction

Pronouns in the Prague Dependency Treebank 2.0

Personal pronouns

Other pronoun types

Pro-adverbs and pro-numerals

Application of the presented scheme to English and German

Final remarks
SLE 2006
2/20
razimova@ufal.mff.cuni.cz
Introduction

Pronouns and other pro-forms





to replace or substitute other words, phrases, or sentences
anaphoric and deictic functions
pronouns, also pro-adjectives, pro-adverbs, and pro-numerals
‘closed’ classes
semantically relevant regularities within the sub-classes
(nobody-never-nowhere; everybody-always-everywhere)

Pro-forms in the Prague Dependency Treebank 2.0



SLE 2006
formal linguistic system for annotation of pro-forms
making the present regularities explicit
part of the deep-syntactic layer (tectogrammatical layer, t-layer)
representation by a reduced set of (underlying) lemmas in combination
with relevant attributes
3/20
razimova@ufal.mff.cuni.cz
PDT project – historical background

mid 1960’s Functional Generative Description (Petr Sgall et al.)

1994
Czech National Corpus

1995
PDT started

1998
PDT 0.5 pre-release

2001
PDT 1.0 released by LDC (LDC2001T10)


manual annotation of morphology and surface syntax
2006

PDT 2.0 (just) released by LDC (LDC2006T01)
interlinked morphological, surface-syntactic
and complex deep-syntactic annotation
SLE 2006
4/20
razimova@ufal.mff.cuni.cz
PDT 2.0
Layers of annotation

tectogrammatical layer



analytical layer



surface-syntactic dependency tree
75 % of the m-layer data (5,330 doc.,
87,913 sent., 1,503,739 tokens)
morphological layer



deep-syntactic dependency tree
59 % of the a-layer data (3,165 doc.,
49,431 sent., 833,195 tokens)
m-lemma and m-tag
associated with each token
7,110 textual documents (115,844 sent.
with 1,957,247 tokens)
word layer

SLE 2006
original text, segmented on word
boundaries
5/20
lit: He-was would went toforest.
razimova@ufal.mff.cuni.cz
He would have gone
to the forest.
Outline of the talk

Introduction

Pronouns in the Prague Dependency Treebank 2.0

Personal pronouns

Other pronoun types

Pro-adverbs and pro-numerals

Application of the presented scheme to English and German

Final remarks
SLE 2006
6/20
razimova@ufal.mff.cuni.cz
Pronouns
in the Prague Dependency Treebank 2.0

at the t-layer, personal pronouns treated separately from
the other pronoun types

pro-adverbs and pro-numerals represented in the same
way like indefinite, negative etc. pronouns

semantic features originally present in the word form
extracted and stored as values of inner attributes of the
t-node that corresponds to the given word form
SLE 2006
7/20
razimova@ufal.mff.cuni.cz
Personal pronouns in the PDT 2.0




all personal pronouns (no matter whether they are present in
the sentence, or restored at the t-layer) are represented by
nodes labeled with a single, ‘artificial’ lemma #PersPron
grammatical information that a personal pronoun expresses in
the sentence is stored in node attributes person, number, and
gender
attribute politeness for discerning between honorific and
non-honorific usage
for example: the pronoun vy in vy jste přišel (you came said
politely to a single person) is represented as follows:


#PersPron + 2nd(person) + singular + masc.anim. + polite
possessive pronouns which correspond to personal pronouns
(jeho (his), náš (our)) are represented in the same way
SLE 2006
8/20
razimova@ufal.mff.cuni.cz
Personal pronouns and co-reference
at the t-layer, representation of
personal pronouns was completed
with the annotation of co-reference
(i.e relations between nodes
referring to the same entity)
Tím, že Evropská unie nechala ve rwandské operaci Francii na holičkách, podle Léotarda ukázala,
že její politika nemá žádný africký rozměr. (According to Léotard, by the fact that the European
SLE 2006
Union left
France in the lurch concerning the Rwanda9/20
operation, [it] has shownrazimova@ufal.mff.cuni.cz
that its politics has
Other pronoun types in the PDT 2.0



indefinite, negative, interrogative, and relative pronouns
in Czech pronoun system, single meanings are expressed regularly
by means of a relatively small group of prefixes that join together
with a small set of bases
transparent correspondence between the semantic features and formal
composition of pronouns:



at the t-layer, pronouns with the same base element grouped together –
each pronoun group represented by the lemma corresponding to the respective
relative pronoun:



indefinite prefix ně-: někdo (somebody) – něco (something) – nějaký (some)
negative prefix ni-: nikdo (nobody) – nic (nothing)…
e.g. někdo (somebody) and nikdo (nobody) represented by the lemma kdo (who)
corresponding possessive pronouns represented in the same way
the semantic feature completing the reduced lemma was stored
in the indeftype attribute
SLE 2006
10/20
razimova@ufal.mff.cuni.cz
Other pronoun types
and the indeftype attribute


all indefinite, negative, interrogative, and relative pronouns represented by
only four lemmas at the t-layer
the reduced lemmas were completed by a value of the indeftype attribute
(actually 11 values)
SLE 2006
11/20
razimova@ufal.mff.cuni.cz
Pro-adverbs and pro-numerals
in the PDT 2.0



in Czech, pro-adverbs (e.g. nikde (nowhere), nějak
(somehow)) and pro-numerals (e.g. několik (a few))
express the same semantic features like pronouns
represented in the same way like indefinite, negative,
interrogative, and relative pronouns at the t-layer
another derivational relation can be seen between proadverbs with directional meaning and those of location –
for example, the adverb odněkud (from somewhere) is
represented as follows:

SLE 2006
lemma kde (where) + indef value (of the indeftype attribute) +
functor DIR1 capturing the directional meaning
12/20
razimova@ufal.mff.cuni.cz
Zakládá-li si někdo na tom, že se vyhýbá cizím
slovům, pak udělá nejlíp, když se nikdy nepodívá
do Etymologického slovníku jazyka českého.
If someone finds it important that [he] eliminates
foreign words, then the best thing [he] can do is if
[he] never looks in the Etymology Dictionary of
Czech.
SLE 2006
13/20
razimova@ufal.mff.cuni.cz
Outline of the talk

Introduction

Pronouns in the Prague Dependency Treebank 2.0

Personal pronouns

Other pronoun types

Pro-adverbs and pro-numerals

Application of the presented scheme to English and German

Final remarks
SLE 2006
14/20
razimova@ufal.mff.cuni.cz
Application of the presented scheme
to English and German
indefinite, negative, interrogative, and relative pronouns and other
pro-forms are unproductive classes with (at least to a certain extent)
transparent derivational relations also in other languages
preliminary sketch of several English and German pronouns:



still not solved: English anybody, German niemand and nirgendjemand …
SLE 2006
15/20
razimova@ufal.mff.cuni.cz
Related conception: Helbig’s MultiNet (i)

similar treatment of indefinite and negative pronouns as of two
subtypes of the same entity was introduced also in the MultiNet
knowledge representation system

(Helbig, H. (2001), Die semantische Struktur natürlicher Sprache, Springer, 2001)
(Negators and their antonyms, in Helbig (2001), p. 164)
SLE 2006
16/20
razimova@ufal.mff.cuni.cz
Related conception: Helbig’s MultiNet (ii)
Sentential negation with kein (no)
Peter lent his tools to nobody.
Constituent negation with kein (no)
Peter buys no motorcycle, but a bike.
(in Helbig (2001), p. 170)
SLE 2006
17/20
razimova@ufal.mff.cuni.cz
Outline of the talk

Introduction

Pronouns in the Prague Dependency Treebank 2.0

Personal pronouns

Other pronoun types

Pro-adverbs and pro-numerals

Application of the presented scheme to English and German

Final remarks
SLE 2006
18/20
razimova@ufal.mff.cuni.cz
Final remarks

achievements:

all pro-forms in Czech divided into two groups:
• personal (and corresponding possessive) pronouns
• other pronoun types (and corresponding possessive pronouns) and
other pro-forms



several pro-form analogies crossing the part-of-speech boundaries
are explicitly marked in the annotation
verification of the annotation scheme on large-scale data
future work:


SLE 2006
to elaborate the scheme for other languages in more detail, taking
into consideration specific phenomena of the respective language
to describe the relations among the Czech and other pro-form
systems (for example, for the purposes of machine translation)
19/20
razimova@ufal.mff.cuni.cz
http://ufal.mff.cuni.cz/pdt2.0/
SLE 2006
20/20
razimova@ufal.mff.cuni.cz
Download