Lexical Knowledge Representation and DATR1 Computational Language CS292 Andrew Hippisley, March 2007 1. Introduction ......................................................................................................... 2 2. Inference and default inference in DATR ........................................................ 2 3 The term inheritance .......................................................................................... 3 4 Querying lexical entries ...................................................................................... 6 5 The empty path and maximally underspecified sources of inheritance......... 7 6 Evaluable paths ................................................................................................... 9 7 Global inheritance ............................................................................................. 11 References .................................................................................................................. 11 1 This document is a modified extract from Brown and Hippisley (1995). 1 1. Introduction DATR is a general purpose language for lexical knowledge representation which stands independent of any particular theoretical framework. DATR was designed by Roger Evans and Gerald Gazdar (see Evans and Gazdar 1996). It allows for the treatment of irregularity, subregularity and regularity using default inheritance. Generalisations are inherited by default, but can be overridden with local information that is more specific. DATR's use of path extension to override defaults allows for the natural encoding of 'blocking' and the 'elsewhere condition'. DATR has provided a means for representing various areas of lexical knowledge using defaults. The main features are outlined below. 2. Inference and default inference in DATR Different DATR equation types are listed in (1). These are discussed in detail in Evans and Gazdar (1996). (1) a. b. c. d. e. f. g. DATR equation types Node:<path> == value Node1:<path1> == Node2:<path1> (usually written as Node1:<path1> == Node2) Node1:<path1> == Node1:<path2> (usually written as Node1:<path1> == <path2>) Node1:<path1> == Node2:<path2> Node1:<path1> == "<path2>" Node1:<path1> == "Node2:<path1>" (usually written as Node1:<path1> == "Node2") Node1:<path1> == "Node2:<path2>" In (1a)-(1g) Node, Node1 and Node2 are labels for nodes, which are points in a hierarchy or network where one or more pairings of paths and values, or paths and references to values, are to be found. By convention node labels begin with capitals. Paths are enclosed in angle brackets and may contain zero or more attributes. In a linguistically sophisticated theory attributes will be a well-defined set of features (such as accusative or plural, for example). A value can be an atomic symbol or a sequence of atomic symbols. What appears on the right-hand side of an equation could be either a value (as in 1a), a reference to an identical path at a different node (as in 1b), a reference to a different path at the same node (as in 1c), or a reference to a different path at a different node. In (1e)-(1f) the quoted paths or nodes and paths denote global inheritance (see §7). Reference to the new node or path is interpreted in the 'global context', as opposed to the 'local context' as in (1a)-(1d). This means that the quoted path (see 1e) is a reference to an identical path at whatever node is being queried for information (for querying see § 2.2). Quoting a node with a path (1f and 1g) means that that quoted 2 node is set as the node where the value for the path is obtained rather than either the node at which the equation is to be found (the local context) or any possible node which is queried. Significantly, it is not always easy for people to grasp the distinction between local and global inheritance and this is why it is discussed in §7. In addition to the equation types we have illustrated in (1a)-(1g) the righthand sides may also consist of an unlimited concatenation or sequence containing any combination of the right-hand sides to be found in (1a)-(1g). DATR has seven rules of inference to get from equations such as (1a)-(1g) to theorems. It also encodes the notion of inference by default. This is described in Evans and Gazdar (1989b: 69), given here in (2). The italics are ours. (2) “Default inference” In addition to the conventional inference defined above, DATR has a nonmonotonic notion of inference by default: each definitional sentence about some node/path combination implicitly determines additional sentences about all the extensions to the path at that node for which no more specific definitional sentence exists in the theory. " (Evans and Gazdar 1989b: 69) The ordering of attributes in a path is significant. For example, if there are two paths and one contains one attribute and the other contains two attributes and the first attribute in the path which contains two attributes is identical to the attribute in the path which contains one attribute, then the path with two attributes is an extension of the path with one attribute. What (2) states is that, if they are not specified explicitly, any possible paths which are extensions of a path which is given explicitly in a DATR fragment will have the same value as that path. Once extensions are stipulated at a given node, then the value or reference to a value which these extensions have as their right-hand sides will be inherited by all nodes that inherit that information. In other words, the more specific (longest) path wins. However, that node, and others which inherit from it, will still inherit the default value or reference to a value for the other extensions of the path which are not explicitly specified. Some of the implications of the way DATR uses the notion 'longest path wins' are discussed in §5. 3 The term inheritance The idea of inheritance has been around for some time in both computer science in general and work in Artificial Intelligence in particular (Fahlman 1979; Brachman 1985). When we talk of inheritance we mean, in a loose sense, the sharing of information. Hierarchies and networks contain nodes where information is stored. A particular node may inherit information from another node. For example, in a particular human language it would be quite natural to say that words consist of a stem plus a suffix, and this will also be the same for nouns. We can represent this in DATR as in (3). 3 (3) WORD: <mor form> == "<stem>" "<suffix>". NOUN: <> == WORD. This inheritance hierarchy can be represented diagrammatically as in Figure 1. WORD NOUN Figure 1: A simple hierarchy For the language in question we also know that verbs consist of a stem plus a suffix. We can therefore add a further node to our hierarchy, as in (4), and modify figure 1 to figure 2. (4) WORD: <mor form> == "<stem>" "<suffix>". NOUN: <> == WORD. VERB: <> == WORD. WORD VERB NOUN Figure 2: A less simple hierarchy We might wish to add adjectives to our example. We could also add syntactic information about word class. We give the DATR representation of this hierarchy in (5). 4 (5) WORD: <mor form> == "<stem>" "<suffix>". NOUN: <> == WORD <syn cat> == n. ADJ: <> == WORD <syn cat> == a. VERB: <> == WORD <syn cat> == v. What we have so far is a hierarchy of information where the nodes NOUN, VERB and ADJ inherit information from a higher source. This is often referred to as a mother-daughter relationship, and so the notion of inheritance here is tied up with genetic inheritance. Other kinds of inheritance are possible. Multiple inheritance involves a node acquiring information from more than one source. As information from the two sources may be contradictory this can lead to problems over deciding which has precedence. DATR makes use of a particular kind of multiple inheritance, namely orthogonal multiple inheritance. This means that a DATR representation must always be explicit about what kind of information is required from which source, thereby avoiding such contradictions. As a concrete linguistic example of multiple inheritance, we could take an instance in which a particular noun inherits information as part of a morphological hierarchy. It will inherit information directly from the node NOUN, which in turn inherits from the node WORD. However, it will get its information about its semantics from another node which is not part of the morphological hierarchy. The representation of the noun DOG might look something like (6). (6) Dog: <> == NOUN <stem> == dog <sem> == FURRY_ANIMAL. In this example Dog inherits all possible information from NOUN, except its value for stem and its semantics which it obtains from the node FURRY_ANIMAL, which could be part of another hierarchy. We could represent the inheritance relations for Dog diagrammatically as in figure 3. 5 WORD FURRY_ANIM AL NOUN VERB ADJ Dog Figure 3: Multiple inheritance The purpose of this discussion is to explain what the term 'inheritance' may mean. When people first come to use a representation language such as DATR, they will bring with them particular conceptions about what 'inheritance' means. As a metaphor inheritance evokes strong associations in people's minds, and the associations that they make can influence the way they think about inheritance as it is used when talking of inheritance hierarchies and networks. Many people would associate inheritance with hierarchies (but not networks), as the metaphor of gene inheritance is strongest for them. There is an association which assumes that inheritance can only be from an immediate ancestor, as we, strictly speaking, only inherit our genetic properties from our parents. If people conceive of inheritance in this way, they rule out anything that is not involved in grandmother-mother-daughter relations. To compound the problem researchers who have used inheritance of some kind or other have traditionally made little distinction between hierarchies, networks, lattices and so on. It is, however, possible to understand inheritance in terms of property and wealth. In this sense it is possible to inherit money directly from an aunt, or even from someone who is not related to you. The different kinds of inheritance referred to in knowledge representation can be understood in terms of the different metaphors evoked here. For example, it is possible to understand default inheritance, because of its hierarchical nature, in terms of genetic inheritance. It is easier to understand multiple inheritance in terms of the property inheritance metaphor, because it does not necessarily involve inheritance from an immediate ancestor. 4 Querying lexical entries When we talk of querying a node, we mean asking for inferable information about that node. When we are dealing with linguistic theories this will mean querying lexical entry nodes for information that has been declared to be interesting. In many implementations of DATR the way of specifying sensible queries is to declare 'show' paths which state what kind of information you would wish to know about a node. 6 For example to automatically query <stem> <singular> <plural> type #show <stem> <singular> <plural> . It is possible to query nodes in the representation of the theory which are not lexical entries and for which it does not make sense to make certain queries. For instance, it does not make sense to ask for the stem of the node NOUN, because this node does not specify a stem. Normally, one would not wish to query the nodes at which more general information is stored for others to inherit from. 'Hide' declarations are used to hide nodes which do not need to be queried: #Hide NOUN Count_Noun. 5 The empty path and maximally underspecified sources of inheritance In order to represent the default source of inheritance, the empty angle brackets are used. The term 'empty path' is also used. If we consider the equation type (1b) again, where Node1 inherits its value by reference to an identical path at a different node (Node2), it is clear that the default source of inheritance is that other node which a particular node specifies as the place at which to find the value for the 'empty path'. Given the principle of path extension, all paths are extensions of the empty path, and so the value for all paths is inherited from the default source, unless that extension is explicitly stated in an equation at the inheriting node. This means, for example, that Node2 in the equation Node1:<> == Node2 is the default source (which could also be called the maximally underspecified source), and that Node1 will inherit the values for all paths at Node2, unless additional equations at Node2 specify otherwise. Most of the time it will be obvious what the maximal source of inheritance is, but there are two situations where it is less clear. In the first, consider a hierarchy where there are two 'high' nodes A and B which store information that is generalised over a sub-node C. Now we want C to inherit all the paths from both nodes. In other words, we want two maximally underspecified sources of inheritance, as represented in figure 4. As mentioned in § 1, DATR makes use of orthogonal multiple inheritance (Touretzky, 1986) to avoid information contradictions. This means that it is impossible to specify two maximally underspecified sources of inheritance, just as it is impossible to specify different sources of inheritance for the same path. In figure 4 there are different paths specified at the nodes A and B. The use of the empty path twice at node C is equivalent to stating that this node should get the values for all paths from A and the values for all paths from B. Intuitively it appears that there might be no contradiction, as the paths at A and B are different, but there are values for the paths <a> and <b> at node B, and that is that they are undefined at B. The fact that they are undefined at B and have values stipulated at A means that there is a contradiction. Equally, the paths <c> and <d> at node A are undefined and contradict the definitions at A. In order to get round the problem posed by figure 4, it must be decided which of the two nodes A or B is the maximally underspecified source of inheritance. The sub-node C could then inherit all possible paths from this node and a subset of information from the other one. 7 A <a> == 1 <b> == 2 B <c> == 3 <d> == 4 C <> == A <> == B <e> == 5 Figure 4 It may also be necessary to choose attributes which specify the kind of information that is being inherited. Consider the following fragment representing an area of the phonological system of Russian. (7a) Fricative: <>== C <continuant> == '[+continuant]' <strident> == '[-strident]'. (7b) Velar_C: <> == C <coronal> == Labial_C <anterior> == Palatoalv_C. (7c) X: <> == Velar_C <continuant> == Fricative <strident> == Fricative. The node C is the top node in a hierarchy which uses underspecification to define the consonants of Russian. The node X (7c), which represents the phoneme /x/, inherits its general information from the node Velar_C (7b). As it is a fricative it also needs to inherit from the Fricative node. The class of fricatives are [+continuant] (to distinguish them from stops) and [-strident] (to distinguish them from affricates). The node X therefore must multiply inherit from Velar_C and Fricative. However, in a sense both Velar_C and Fricative could be considered main sources of inheritance. There is no hierarchical connection between these two nodes. As only one maximally underspecified source of inheritance is permitted, this is given as Velar_C, and X must multiply inherit from Fricative for the paths <continuant> and <strident> (7c). Unfortunately this appears to be introducing redundancy into the system. One solution here would be to prefix paths inheriting from Velar_C with the attribute place and have X inherit all other information from Fricative, so that (7c) now looks like (8). 8 (8) X: <> == Fricative. <place> == Velar_C. Summary DATR forces researchers to be explicit about the kind of information inherited. With multiple inheritance it is not possible to specify two nodes as providing the same kind of information. If a node is not the main source of inheritance, then it may be necessary to use an attribute to identify the kind of information to be inherited from that node. 6 Evaluable paths In addition to the equation types in (1a)-(1g) DATR implementations provide for 'evaluable paths' where the value of a particular path can be evaluated and then added into another path upon evaluation. This enables values available elsewhere in a DATR network to be used as attributes in a path and the declaration of interdependencies determined by the presence of particular information. As an example of an evaluable path, we consider part of the fragment from Brown and Hippisley (1994: 70). In Russian, there is a phonological distinction between palatalised ('soft'), and non-palatalised ('hard') consonants. Whether a stemfinal consonant is soft or hard has consequences for the morphology. For example, a hard class I noun has genitive plural in -ov, but a soft class I noun has genitive plural in -ej. Now there are some consonants which are phonologically hard, but morphologically they behave as though they were soft. Nouns stems of declension I ending in such consonants attach -ej in the genitive plural. One such consonant is the voiced palatoalveolar fricative /ž/. Thus the genitive plural of muž 'husband' is mužej, and not *mužov. This phonology-morphology interdependency can work the other way. Thus the consonant /j/ is phonologically soft but morphologically hard, so that the genitive plural of tramvaj 'tram' is tramvajov.2 Note that /j/ can also appear as a special suffix that creates plural stems as in brat- (sing.) 'brother' > bratj- 'plural'. Again, /j/ is morphologically hard so that the genitive plural is bratjov. In order get the right value for morphological hardness, i.e. hard or soft, we have to attach the condition that if the consonant of the lexical entry in question is phonologically soft then we want soft, and if hard then hard. However, if it is soft but a /j/ then we want hard, and if it is hard and /ž/ then we want soft. Thus we will have a node (12) which contains two possible values, soft and hard, and the conflict arising as to which one to attach will be resolved by referring to information specific for the lexical entry in question. In (9a) information from a lexical entry is evaluated and added into the path. Evaluation of the paths within the evaluable path on the right-hand side of (9a) will determine which path at the node MORPHARD (10) is referenced for the value of the path <mor stem hardness>. 2The forms given here are in phonological transcription, not standard orthography or transliteration. 9 (9) NOUN: a. <mor stem hardness> == MORPHARD: <"<phon stem hardness>" "<suffix pl>" "<infl_root final>"> ... (10) MORPHARD: a. <soft> == soft b. <soft j> == hard c. <soft none j> == hard d. <hard> == hard e. <hard none š> == soft f. <hard none ž> == soft. Care must be taken that paths to be evaluated in evaluable paths are defined. For (11) <phon stem hardness> is undefined, as only its extension <phon stem hardness sg> is specified in the lexical entry. If the value for <phon stem hardness> is stated to be 'hard' by a default elsewhere in the network, then the noun in (11) will inherit the value 'hard' for <phon stem hardness>, which may conflict with the information given by the path extension. (11) Kost´: <> == N_III <gloss> == bone <infl_root> == kost´ <infl_root final> == t´ <stress> == Stress_3i <phon stem hardness sg> == soft <sem animacy> == inanimate. Summary There may be occasions when we wish to state that the value for a path is dependent on the values that other paths may have. To encode this kind of dependent knowledge, we use evaluable paths. We must be careful that the paths to be evaluated are defined elsewhere in the network. Furthermore, if there are examples of paths elsewhere in the network which extend any of the paths within an evaluable path, these will not be evaluated. 10 7 Global inheritance In §1 we considered the term 'inheritance' and the possible ways in which the metaphor of inheritance could be understood. In example (3) in §1 a toy fragment was given in which the generalisation was stated that a word might consist of a stem plus a suffix. It is obvious that at the node which generalises over all words, allowing for overrides of the default, the actual value which is the realization of the attribute stem cannot be stated, as it will differ from lexical item to lexical item. It is possible to understand global inheritance as a statement to the effect that the value for a particular path is found by determining the value for another path at the node which is being queried (see § 2). For instance, we could add to our toy example (5) in §1 by stating that by default there is no suffix (12). (12) WORD: <suffix> == <mor form> == "<stem>" "<suffix>". NOUN: <> == WORD. If we consider a noun like DOG we would find that, by default, its morphological form is just the stem (13). (13) Dog: <> == NOUN <stem> == dog <sem> == FURRY_ANIMAL. In the DATR equation at WORD we see that the path "<stem>" is quoted, because its value will depend on the value specified at the query node Dog. The path "<suffix>" is also quoted. In this case the value for <suffix>, which is nothing, is inherited by the node Dog from WORD via NOUN. If the paths were not quoted, this would be 'local inheritance'. Not quoting the paths means that the value is obtained locally at the node WORD. So the value for <mor form> would be the concatenation of the value for <stem> as specified at WORD, or inherited by WORD from a higher node, and the value of <suffix> as specified at WORD. Of course, there is no value specified for <stem> at WORD and so the equation could not be evaluated. References Brachman, Ronald J. 1985. I lied about the trees. Or, defaults and definitions in knowledge representation. AI magazine 6. 80-93. Brown, Dunstan; and Hippisley, Andrew. 1995. DATR for linguists. Deliverable for ESRC grant # R000233633 and Leverhulme Trust grant #F.242M. Brown, Dunstan and Andrew Hippisley. 1994. Conflict in Russian Genitive Plural Assignment: A Solution Represented in DATR. Journal of Slavic Linguistics 2. 48-76. 11 Evans, Roger and Gerald Gazdar. 1989a. Inference in DATR. Proceedings of the 4th Conference of the European Chapter of the Association for Computational Linguistics, 66-71. Manchester, England. Evans, Roger and Gerald Gazdar. 1989b. The semantics of DATR. In: A. G. Cohn (ed.) Proceedings of the Seventh Conference of the Society for the Study of Artificial Intelligence and Simulation of Behaviour, 79-87. London: Pitman/Morgan Kaufmann. Evans, Roger and Gerald Gazdar. 1996. DATR: A Language For Lexical Knowledge Representation. Computational Linguistics 22 (2). 167-216. Fahlman, Scott E. 1979. Representing and using real-world knowledge. In Patrick H. Winston and R. H. Brown (eds.), Artificial Intelligence: An MIT perspective. Volume 1, 451-70. Cambridge, Mass: MIT Press. Gazdar, G. 1989b. An introduction to DATR. In: Evans, R. and Gazdar, G. The DATR Papers. Brighton: University of Sussex Cognitive Science Research Paper, CSRP 139 (1990). 1-14. Gazdar, Gerald. Forthcoming. Ceteris paribus. To appear in J. A. W. Kamp and C. Rohrer (eds.) Aspects of computational linguistics. Berlin: Springer. Jenkins, Elizabeth. 1990. Enhancements to the Sussex Prolog DATR Implementation. In: Evans, R. and Gazdar, G. The DATR Papers. Brighton: University of Sussex Cognitive Science Research Paper, CSRP 139 (1990). 41-62. Touretzky, David S. 1986. The Mathematics of Inheritance Systems. London: Pitman. 12