PrepNet: a Framework for Describing Prepositions: Preliminary Investigation results Patrick Saint-Dizier IRIT-CNRS, France Long-term objectives • Construct a repository of preposition syntactic and semantic behaviors, • Develop a multi-level approach, from prototypical uses to unexpected ones, that accounts for diversity of preposition uses and for their polysemic behavior, • Develop a relatively shallow semantic characterization based on frames, • Investigate the verb-preposition-NP relations: restrictions and compositionality • Develop a multi-lingual approach. Applications: MT, Knowledge extraction, QA, etc. This paper: basic elements of a preliminary approach • Introduce a general characterization of preposition senses viewed as abstract notions, • Characterize these abstract notions by means of frames (viewed as linguistic or conceptual macros), • Populate preposition frames via corpus and then validate, • Develop a multi-level characterization of preposition uses, to organize the diversity of their uses in language, • Raise a few questions about multilinguality (prepositions can be realized by other categories or by morphology in some languages) Investigate evaluation methods, in abstracto, and via applications. Related work • Very little in CL circles compared to verbs and nouns, in spite of their necessity in a number of applications (MT, IE, QA, …), • Almost nothing in EWN, FrameNet or VerbNet, • Some valuable work in AI: e.g. temporal, spatial reasoning, • A few isolated works in linguistics on a given preposition, • Quite a lot of work in psycho-linguistics. Other resources: B. Dorr’s large description for English, with MT in view (about 500 entries). Why is that so ? • High polysemy (but may be not more than adjectives?, and smaller number: 95 preps. in French + compounds, 32 in Spanish: not always agreement on what a preposition is…..) • Linguistic realizations very difficult to predict, large number of idiosyncratic uses and cross-linguistic differences, • Syntactic difficulties due to the chain V-Prep-N, e.g.: PPattachment problems, VPC, • Deep level in the semantic-cognitive structure: prepositions often used in metalanguages as primitives Study here only compositional uses of prepositions Global architecture of the proposal Prep. Senses: 3 level set of abstract notions Shallow semantic representation with strata Uses in language 1 Uses in language 2 etc. General architecture (1): categorizing preposition senses Preposition categorization on 3 levels: – Family (roughly thematic roles): localization, manner, quantity, etc. – Facets: localization: source, position, destination, etc. – Modalities. Facets viewed as abstract notions on which PrepNet is based 12 families defined Families/ facets Quantity: numerical/ frequency / proportion Accompaniment: adjunction/ simultaneity/ inclusion/ exclusion Manner: means/ manners and attitudes/ imitation or analogy Localisation: source/ destination/ via/ fixed position Choice and exchange: exchange / choice or alternative / substitution Causality: cause/ goal or consequence/ intention Opposition Ordering: priority/ subordination/ hierarchy/ ranking/ degree of importance Minor elements: about, in spite of, comparison (see examples in paper) Conceptual/ ontological status of these dictinctions ?? • Families ‘superframes’ : general principles and restrictions • Facets: frames, strata: subframes : with some general forms of inheritance and property consistency • Whenever appropriate: modalities subframes Frames are viewed as linguistic macros, to be interpreted. They are shallow or coarsed-grained representations so far. Language realizations are a priori associated with the lower level frame nodes. (2): a conceptual, prelexical structure - name + gloss, - shallow restrictions - simplified LCS representation Frame of abstract notion SF1 SF2 SF3 strata of abstract notion: subframes Structure of a frame • Structure: – – – – Number, name, gloss, Frame with shallow constraints: X <Action> Y [Number] Z Conceptual representation in simplified LCS (kind of LST) In the future: inferential patterns (within a frame or among frames) 195 senses/abstract notions described using 65 primitives Shallow constraints: (1) generic semantic types (2) generic verb class types from WordNet (3) generic semantic fields from the LCS: temp, poss, loc, psy, epist, perc, amount, comm, prop, abs, etc. Example 1: ‘via’ [1] : VIA - generic. 'An entity X moving via a location Y' X <ACTION> [1] Y X: concrete entity, ACTION: movement verb, Y: location representation: X : via(loc, Y) French synset: {par, via} example: Jean rentre par la porte Stratification 1: [1.1] : VIA - narrow passage. 'An entity X moving via / an action that uses a narrow passage in an object Y' X <ACTION> [1.1] Y X: concrete entity, ACTION: perception verb, Y: location with a narrow passage representation: X : through(loc or temp, Y) French synset: {a travers, au travers de, dans} example: Jean regarde a travers la grille / dans les jumelles. . Example 1, cont’: Stratification 2: [1.2.1] VIA UNDER – from generic 'An entity X moving via under a location Y' X <ACTION> [1.2.1] Y X: concrete entity, ACTION: movement verb, Y: location with a form of passage under it representation: X : via(loc, under(loc,Y)) French synset: {par dessous} example: Jean passe par dessous le pont. [1.2.2] VIA ABOVE – from generic etc. Example 2: instruments Stratification requires the taking into account of 2 relations, characterized by means of primitives (Mari and Saint-Dizier 03): – Actor/instrument: undergo (no control), select (controls another prop.), control, – Instrument/ V+NP object: be (passive, but participates), react (other prop than controlled by the agent), act (full participation) Contrast: cut the bread with a knife / eat soup with a spoon John burned himself with boiling oil. A generic entry for instruments, and, potentially: 9 strata (combinations), depends on language. 4 strata for French (2) cont’ [5] : MANNER - MEANS - Instrument 'Someone X doing an action Y using instrument Z.' X <ACTION> Y [5] Z X: human, ACTION: verb of change, Y: object Z: instrument representation: X: by-means-of(_, Z) Followed by a priori 9 Strata. Example: Application to French: 1. Be(X,Z) Λ Undergo(Z, Action+Y) : synset: {grâce à} , restrictions… 2. Be(X,Z) Λ Select (Z, Action+Y) : synset: {par} , restrictions… 3. Select(X,Z) Λ React (Z, Action+Y) : synset: {avec} , restrictions… 4. Act(X,Z) Λ Control (Z, Action+Y) : synset: {avec, au moyen de}, ….. (3) The language realization level SFi (= lower frame level) Multi-level partitioning of realizations from usage norms Direct uses Indirect uses etc… etc… restr1 synset1 restr2 …. restr3 synset3 Derived types, … synsets ?? … + frequency measures Populating preposition frames from corpora • Conceptual frames are associated with shallow constraints Move on to the language level, elements of a method: • For a given language: associate each frame strata with corpus and dictionary observations • Manual analysis: identify prototypical uses, promote usage norms multi-level partitioning of realizations • Contrast, if possible, direct versus indirect (mainly metaphorical) realization levels • Elaborate conceptual/ontological status of categorizations and related constraints (mainly semantic types) A few notes • Multi-level architecture: helps to account for the large variety of (compositional) behaviors, investigate in more depth partitioning strategies, incremental depth to get finer-grained analysis worth pursuing?? • For each synset: develop frequency measures, identify contexts of use (from syntactic to type of text): frequency rates are very diverse (some uses are only found in dictionaries!) • Populate but then valide on new corpora: develop several forms of corpus annotations (the frame; the relation with the head, with the NP, etc.) Looking at other languages • Hypothesis: given an abstract notion (interlingua), translations are constructed on the basis of the restrictions that hold on the corresponding synsets, BUT: • Large realization variations are in general observed, even for closely related languages: up to what point is this just surface language contrasts? Or is it also conceptual ? : Regarder dans le microscope / look through the microscope (durch; a travès de) • Some languages have do not use so much pre-/postpositions, but other categories, incorporation in heads, or just case marks . Preliminary conclusions • Preliminary investigation to identify difficulties and organize the research, • Global architecture looks an interesting approach • Abstract notion definitions seem to be quite stable, status of strata needs further investigations, • Multi-level approach to language realizations seems a good direction, but needs a much larger testing on a number of languages and a more clear method to organize sets of realizations • Implement an open system on the Web. Some obvious research directions ontological/conceptual status of categorizations and restrictions, Investigate integration with other frameworks: VerbNet, FrameNet, Investigate preposition polysemy and derived uses in more depth, and ways to characterize it Relations Head-preposition-NP, and compositionality (Head is often a verb, but can be any other kind of predicate): some PPs have wider scope over the proposition. Inferential patterns associated with prepositions (e.g. for approximation notions, spatial notions, etc.)