DATR for linguists

advertisement
Lexical Knowledge Representation and DATR1
Computational Language CS292
Andrew Hippisley, March 2007
1.
Introduction ......................................................................................................... 2
2.
Inference and default inference in DATR ........................................................ 2
3
The term inheritance .......................................................................................... 3
4
Querying lexical entries ...................................................................................... 6
5
The empty path and maximally underspecified sources of inheritance......... 7
6
Evaluable paths ................................................................................................... 9
7
Global inheritance ............................................................................................. 11
References .................................................................................................................. 11
1
This document is a modified extract from Brown and Hippisley (1995).
1
1.
Introduction
DATR is a general purpose language for lexical knowledge representation which
stands independent of any particular theoretical framework. DATR was designed by
Roger Evans and Gerald Gazdar (see Evans and Gazdar 1996). It allows for the
treatment of irregularity, subregularity and regularity using default inheritance.
Generalisations are inherited by default, but can be overridden with local information
that is more specific. DATR's use of path extension to override defaults allows for the
natural encoding of 'blocking' and the 'elsewhere condition'. DATR has provided a
means for representing various areas of lexical knowledge using defaults. The main
features are outlined below.
2.
Inference and default inference in DATR
Different DATR equation types are listed in (1). These are discussed in detail
in Evans and Gazdar (1996).
(1)
a.
b.
c.
d.
e.
f.
g.
DATR equation types
Node:<path> == value
Node1:<path1> == Node2:<path1>
(usually written as Node1:<path1> == Node2)
Node1:<path1> == Node1:<path2>
(usually written as Node1:<path1> == <path2>)
Node1:<path1> == Node2:<path2>
Node1:<path1> == "<path2>"
Node1:<path1> == "Node2:<path1>"
(usually written as Node1:<path1> == "Node2")
Node1:<path1> == "Node2:<path2>"
In (1a)-(1g) Node, Node1 and Node2 are labels for nodes, which are points in a
hierarchy or network where one or more pairings of paths and values, or paths and
references to values, are to be found. By convention node labels begin with capitals.
Paths are enclosed in angle brackets and may contain zero or more attributes. In a
linguistically sophisticated theory attributes will be a well-defined set of features
(such as accusative or plural, for example).
A value can be an atomic symbol or a sequence of atomic symbols. What
appears on the right-hand side of an equation could be either a value (as in 1a), a
reference to an identical path at a different node (as in 1b), a reference to a different
path at the same node (as in 1c), or a reference to a different path at a different node.
In (1e)-(1f) the quoted paths or nodes and paths denote global inheritance (see §7).
Reference to the new node or path is interpreted in the 'global context', as opposed to
the 'local context' as in (1a)-(1d). This means that the quoted path (see 1e) is a
reference to an identical path at whatever node is being queried for information (for
querying see § 2.2). Quoting a node with a path (1f and 1g) means that that quoted
2
node is set as the node where the value for the path is obtained rather than either the
node at which the equation is to be found (the local context) or any possible node
which is queried. Significantly, it is not always easy for people to grasp the distinction
between local and global inheritance and this is why it is discussed in §7.
In addition to the equation types we have illustrated in (1a)-(1g) the righthand sides may also consist of an unlimited concatenation or sequence containing any
combination of the right-hand sides to be found in (1a)-(1g).
DATR has seven rules of inference to get from equations such as (1a)-(1g) to
theorems. It also encodes the notion of inference by default. This is described in
Evans and Gazdar (1989b: 69), given here in (2). The italics are ours.
(2)
“Default inference”
In addition to the conventional inference defined above, DATR has a
nonmonotonic notion of inference by default: each definitional
sentence about some node/path combination implicitly determines
additional sentences about all the extensions to the path at that node for
which no more specific definitional sentence exists in the theory. "
(Evans and Gazdar 1989b: 69)
The ordering of attributes in a path is significant. For example, if there are
two paths and one contains one attribute and the other contains two attributes and the
first attribute in the path which contains two attributes is identical to the attribute in
the path which contains one attribute, then the path with two attributes is an extension
of the path with one attribute. What (2) states is that, if they are not specified
explicitly, any possible paths which are extensions of a path which is given explicitly
in a DATR fragment will have the same value as that path. Once extensions are
stipulated at a given node, then the value or reference to a value which these
extensions have as their right-hand sides will be inherited by all nodes that inherit that
information. In other words, the more specific (longest) path wins. However, that
node, and others which inherit from it, will still inherit the default value or reference
to a value for the other extensions of the path which are not explicitly specified. Some
of the implications of the way DATR uses the notion 'longest path wins' are discussed
in §5.
3
The term inheritance
The idea of inheritance has been around for some time in both computer science in
general and work in Artificial Intelligence in particular (Fahlman 1979; Brachman
1985). When we talk of inheritance we mean, in a loose sense, the sharing of
information. Hierarchies and networks contain nodes where information is stored. A
particular node may inherit information from another node. For example, in a
particular human language it would be quite natural to say that words consist of a
stem plus a suffix, and this will also be the same for nouns. We can represent this in
DATR as in (3).
3
(3)
WORD:
<mor form> == "<stem>" "<suffix>".
NOUN:
<> == WORD.
This inheritance hierarchy can be represented diagrammatically as in Figure 1.
WORD
NOUN
Figure 1: A simple hierarchy
For the language in question we also know that verbs consist of a stem plus a
suffix. We can therefore add a further node to our hierarchy, as in (4), and modify
figure 1 to figure 2.
(4)
WORD:
<mor form> == "<stem>" "<suffix>".
NOUN:
<> == WORD.
VERB:
<> == WORD.
WORD
VERB
NOUN
Figure 2: A less simple hierarchy
We might wish to add adjectives to our example. We could also add syntactic
information about word class. We give the DATR representation of this hierarchy in
(5).
4
(5)
WORD:
<mor form> == "<stem>" "<suffix>".
NOUN:
<> == WORD
<syn cat> == n.
ADJ:
<> == WORD
<syn cat> == a.
VERB:
<> == WORD
<syn cat> == v.
What we have so far is a hierarchy of information where the nodes NOUN,
VERB and ADJ inherit information from a higher source. This is often referred to as a
mother-daughter relationship, and so the notion of inheritance here is tied up with
genetic inheritance.
Other kinds of inheritance are possible. Multiple inheritance involves a node
acquiring information from more than one source. As information from the two
sources may be contradictory this can lead to problems over deciding which has
precedence. DATR makes use of a particular kind of multiple inheritance, namely
orthogonal multiple inheritance. This means that a DATR representation must always
be explicit about what kind of information is required from which source, thereby
avoiding such contradictions.
As a concrete linguistic example of multiple inheritance, we could take an
instance in which a particular noun inherits information as part of a morphological
hierarchy. It will inherit information directly from the node NOUN, which in turn
inherits from the node WORD. However, it will get its information about its
semantics from another node which is not part of the morphological hierarchy. The
representation of the noun DOG might look something like (6).
(6)
Dog:
<> == NOUN
<stem> == dog
<sem> == FURRY_ANIMAL.
In this example Dog inherits all possible information from NOUN, except its value for
stem and its semantics which it obtains from the node FURRY_ANIMAL, which could be
part of another hierarchy. We could represent the inheritance relations for Dog
diagrammatically as in figure 3.
5
WORD
FURRY_ANIM AL
NOUN
VERB
ADJ
Dog
Figure 3: Multiple inheritance
The purpose of this discussion is to explain what the term 'inheritance' may
mean. When people first come to use a representation language such as DATR, they
will bring with them particular conceptions about what 'inheritance' means. As a
metaphor inheritance evokes strong associations in people's minds, and the
associations that they make can influence the way they think about inheritance as it is
used when talking of inheritance hierarchies and networks.
Many people would associate inheritance with hierarchies (but not networks),
as the metaphor of gene inheritance is strongest for them. There is an association
which assumes that inheritance can only be from an immediate ancestor, as we,
strictly speaking, only inherit our genetic properties from our parents. If people
conceive of inheritance in this way, they rule out anything that is not involved in
grandmother-mother-daughter relations. To compound the problem researchers who
have used inheritance of some kind or other have traditionally made little distinction
between hierarchies, networks, lattices and so on.
It is, however, possible to understand inheritance in terms of property and
wealth. In this sense it is possible to inherit money directly from an aunt, or even
from someone who is not related to you.
The different kinds of inheritance referred to in knowledge representation can
be understood in terms of the different metaphors evoked here. For example, it is
possible to understand default inheritance, because of its hierarchical nature, in terms
of genetic inheritance. It is easier to understand multiple inheritance in terms of the
property inheritance metaphor, because it does not necessarily involve inheritance
from an immediate ancestor.
4
Querying lexical entries
When we talk of querying a node, we mean asking for inferable information about
that node. When we are dealing with linguistic theories this will mean querying
lexical entry nodes for information that has been declared to be interesting. In many
implementations of DATR the way of specifying sensible queries is to declare 'show'
paths which state what kind of information you would wish to know about a node.
6
For example to automatically query <stem> <singular> <plural> type #show
<stem> <singular> <plural> .
It is possible to query nodes in the representation of the theory which are not
lexical entries and for which it does not make sense to make certain queries. For
instance, it does not make sense to ask for the stem of the node NOUN, because this
node does not specify a stem. Normally, one would not wish to query the nodes at
which more general information is stored for others to inherit from.
'Hide'
declarations are used to hide nodes which do not need to be queried: #Hide NOUN
Count_Noun.
5
The empty path and maximally underspecified sources of inheritance
In order to represent the default source of inheritance, the empty angle brackets are
used. The term 'empty path' is also used. If we consider the equation type (1b) again,
where Node1 inherits its value by reference to an identical path at a different node
(Node2), it is clear that the default source of inheritance is that other node which a
particular node specifies as the place at which to find the value for the 'empty path'.
Given the principle of path extension, all paths are extensions of the empty path, and
so the value for all paths is inherited from the default source, unless that extension is
explicitly stated in an equation at the inheriting node. This means, for example, that
Node2 in the equation Node1:<> == Node2 is the default source (which could also be
called the maximally underspecified source), and that Node1 will inherit the values for
all paths at Node2, unless additional equations at Node2 specify otherwise.
Most of the time it will be obvious what the maximal source of inheritance is,
but there are two situations where it is less clear. In the first, consider a hierarchy
where there are two 'high' nodes A and B which store information that is generalised
over a sub-node C. Now we want C to inherit all the paths from both nodes. In other
words, we want two maximally underspecified sources of inheritance, as represented
in figure 4. As mentioned in § 1, DATR makes use of orthogonal multiple
inheritance (Touretzky, 1986) to avoid information contradictions. This means that it
is impossible to specify two maximally underspecified sources of inheritance, just as
it is impossible to specify different sources of inheritance for the same path.
In figure 4 there are different paths specified at the nodes A and B. The use of
the empty path twice at node C is equivalent to stating that this node should get the
values for all paths from A and the values for all paths from B. Intuitively it appears
that there might be no contradiction, as the paths at A and B are different, but there
are values for the paths <a> and <b> at node B, and that is that they are undefined at
B. The fact that they are undefined at B and have values stipulated at A means that
there is a contradiction. Equally, the paths <c> and <d> at node A are undefined and
contradict the definitions at A.
In order to get round the problem posed by figure 4, it must be decided which
of the two nodes A or B is the maximally underspecified source of inheritance. The
sub-node C could then inherit all possible paths from this node and a subset of
information from the other one.
7
A
<a> == 1
<b> == 2
B
<c> == 3
<d> == 4
C
<> == A
<> == B
<e> == 5
Figure 4
It may also be necessary to choose attributes which specify the kind of
information that is being inherited. Consider the following fragment representing an
area of the phonological system of Russian.
(7a)
Fricative:
<>== C
<continuant> == '[+continuant]'
<strident> == '[-strident]'.
(7b)
Velar_C:
<> == C
<coronal> == Labial_C
<anterior> == Palatoalv_C.
(7c)
X:
<> == Velar_C
<continuant> == Fricative
<strident> == Fricative.
The node C is the top node in a hierarchy which uses underspecification to define the
consonants of Russian. The node X (7c), which represents the phoneme /x/, inherits
its general information from the node Velar_C (7b). As it is a fricative it also needs to
inherit from the Fricative node. The class of fricatives are [+continuant] (to
distinguish them from stops) and [-strident] (to distinguish them from affricates). The
node X therefore must multiply inherit from Velar_C and Fricative. However, in a
sense both Velar_C and Fricative could be considered main sources of inheritance.
There is no hierarchical connection between these two nodes. As only one maximally
underspecified source of inheritance is permitted, this is given as Velar_C, and X
must multiply inherit from Fricative for the paths <continuant> and <strident>
(7c). Unfortunately this appears to be introducing redundancy into the system.
One solution here would be to prefix paths inheriting from Velar_C with the
attribute place and have X inherit all other information from Fricative, so that (7c)
now looks like (8).
8
(8)
X:
<> == Fricative.
<place> == Velar_C.
Summary
DATR forces researchers to be explicit about the kind of information inherited. With
multiple inheritance it is not possible to specify two nodes as providing the same kind
of information. If a node is not the main source of inheritance, then it may be
necessary to use an attribute to identify the kind of information to be inherited from
that node.
6
Evaluable paths
In addition to the equation types in (1a)-(1g) DATR implementations provide for
'evaluable paths' where the value of a particular path can be evaluated and then added
into another path upon evaluation. This enables values available elsewhere in a
DATR network to be used as attributes in a path and the declaration of
interdependencies determined by the presence of particular information.
As an example of an evaluable path, we consider part of the fragment from
Brown and Hippisley (1994: 70). In Russian, there is a phonological distinction
between palatalised ('soft'), and non-palatalised ('hard') consonants. Whether a stemfinal consonant is soft or hard has consequences for the morphology. For example, a
hard class I noun has genitive plural in -ov, but a soft class I noun has genitive plural
in -ej. Now there are some consonants which are phonologically hard, but
morphologically they behave as though they were soft. Nouns stems of declension I
ending in such consonants attach -ej in the genitive plural. One such consonant is the
voiced palatoalveolar fricative /ž/. Thus the genitive plural of muž 'husband' is mužej,
and not *mužov. This phonology-morphology interdependency can work the other
way. Thus the consonant /j/ is phonologically soft but morphologically hard, so that
the genitive plural of tramvaj 'tram' is tramvajov.2 Note that /j/ can also appear as a
special suffix that creates plural stems as in brat- (sing.) 'brother' > bratj- 'plural'.
Again, /j/ is morphologically hard so that the genitive plural is bratjov.
In order get the right value for morphological hardness, i.e. hard or soft, we
have to attach the condition that if the consonant of the lexical entry in question is
phonologically soft then we want soft, and if hard then hard. However, if it is soft
but a /j/ then we want hard, and if it is hard and /ž/ then we want soft. Thus we will
have a node (12) which contains two possible values, soft and hard, and the conflict
arising as to which one to attach will be resolved by referring to information specific
for the lexical entry in question.
In (9a) information from a lexical entry is evaluated and added into the path.
Evaluation of the paths within the evaluable path on the right-hand side of (9a) will
determine which path at the node MORPHARD (10) is referenced for the value of the
path <mor stem hardness>.
2The
forms given here are in phonological transcription, not standard orthography or transliteration.
9
(9)
NOUN:
a.
<mor stem hardness> == MORPHARD: <"<phon stem hardness>"
"<suffix pl>" "<infl_root final>">
...
(10)
MORPHARD:
a.
<soft> == soft
b.
<soft j> == hard
c.
<soft none j> == hard
d.
<hard> == hard
e.
<hard none š> == soft
f.
<hard none ž> == soft.
Care must be taken that paths to be evaluated in evaluable paths are defined.
For (11) <phon stem hardness> is undefined, as only its extension
<phon stem hardness sg> is specified in the lexical entry. If the value for
<phon stem hardness> is stated to be 'hard' by a default elsewhere in the network,
then the noun in (11) will inherit the value 'hard' for <phon stem hardness>, which
may conflict with the information given by the path extension.
(11)
Kost´:
<> == N_III
<gloss> == bone
<infl_root> == kost´
<infl_root final> == t´
<stress> == Stress_3i
<phon stem hardness sg> == soft
<sem animacy> == inanimate.
Summary
There may be occasions when we wish to state that the value for a path is dependent
on the values that other paths may have. To encode this kind of dependent
knowledge, we use evaluable paths. We must be careful that the paths to be evaluated
are defined elsewhere in the network. Furthermore, if there are examples of paths
elsewhere in the network which extend any of the paths within an evaluable path,
these will not be evaluated.
10
7
Global inheritance
In §1 we considered the term 'inheritance' and the possible ways in which the
metaphor of inheritance could be understood. In example (3) in §1 a toy fragment
was given in which the generalisation was stated that a word might consist of a stem
plus a suffix. It is obvious that at the node which generalises over all words,
allowing for overrides of the default, the actual value which is the realization of the
attribute stem cannot be stated, as it will differ from lexical item to lexical item. It is
possible to understand global inheritance as a statement to the effect that the value for
a particular path is found by determining the value for another path at the node which
is being queried (see § 2). For instance, we could add to our toy example (5) in §1 by
stating that by default there is no suffix (12).
(12)
WORD:
<suffix> ==
<mor form> == "<stem>" "<suffix>".
NOUN:
<> == WORD.
If we consider a noun like DOG we would find that, by default, its morphological
form is just the stem (13).
(13)
Dog:
<> == NOUN
<stem> == dog
<sem> == FURRY_ANIMAL.
In the DATR equation at WORD we see that the path "<stem>" is quoted, because its
value will depend on the value specified at the query node Dog. The path "<suffix>"
is also quoted. In this case the value for <suffix>, which is nothing, is inherited by
the node Dog from WORD via NOUN. If the paths were not quoted, this would be 'local
inheritance'. Not quoting the paths means that the value is obtained locally at the
node WORD. So the value for <mor form> would be the concatenation of the value for
<stem> as specified at WORD, or inherited by WORD from a higher node, and the value
of <suffix> as specified at WORD. Of course, there is no value specified for <stem> at
WORD and so the equation could not be evaluated.
References
Brachman, Ronald J. 1985. I lied about the trees. Or, defaults and definitions in
knowledge representation. AI magazine 6. 80-93.
Brown, Dunstan; and Hippisley, Andrew. 1995. DATR for linguists. Deliverable for
ESRC grant # R000233633 and Leverhulme Trust grant #F.242M.
Brown, Dunstan and Andrew Hippisley. 1994. Conflict in Russian Genitive Plural
Assignment: A Solution Represented in DATR. Journal of Slavic Linguistics 2.
48-76.
11
Evans, Roger and Gerald Gazdar. 1989a. Inference in DATR. Proceedings of the 4th
Conference of the European Chapter of the Association for Computational
Linguistics, 66-71. Manchester, England.
Evans, Roger and Gerald Gazdar. 1989b. The semantics of DATR. In: A. G. Cohn
(ed.) Proceedings of the Seventh Conference of the Society for the Study of
Artificial Intelligence and Simulation of Behaviour, 79-87.
London:
Pitman/Morgan Kaufmann.
Evans, Roger and Gerald Gazdar. 1996. DATR: A Language For Lexical Knowledge
Representation. Computational Linguistics 22 (2). 167-216.
Fahlman, Scott E. 1979. Representing and using real-world knowledge. In Patrick H.
Winston and R. H. Brown (eds.), Artificial Intelligence: An MIT perspective.
Volume 1, 451-70. Cambridge, Mass: MIT Press.
Gazdar, G. 1989b. An introduction to DATR. In: Evans, R. and Gazdar, G. The DATR
Papers. Brighton: University of Sussex Cognitive Science Research Paper, CSRP
139 (1990). 1-14.
Gazdar, Gerald. Forthcoming. Ceteris paribus. To appear in J. A. W. Kamp and C.
Rohrer (eds.) Aspects of computational linguistics. Berlin: Springer.
Jenkins, Elizabeth. 1990. Enhancements to the Sussex Prolog DATR Implementation.
In: Evans, R. and Gazdar, G. The DATR Papers. Brighton: University of Sussex
Cognitive Science Research Paper, CSRP 139 (1990). 41-62.
Touretzky, David S. 1986. The Mathematics of Inheritance Systems. London: Pitman.
12
Download