Emelyanov G., Mikhailov D.

advertisement

253

FORMALIZATION OF THE WORD'S LEXICAL MEANING IN A

PROBLEM OF RECOGNITION OF NATURAL LANGUAGE'S

STATEMENTS'S SYNONYMY'S SITUATIONS

1

G. M. Emelyanov 2 , D. V. Mikhailov 2

2 Yaroslav-the-Wise Novgorod State University,

173003, Russia, Velikii Novgorod, ul. Bol'shaya St. Petersburgskaya, 41, tel.: (8162)627940, e-mail : gem@novsu.ac.ru (G.M. Emelyanov), mdv@novsu.ac.ru (D.V. Mikhailov)

The approach to formalization of semantic correlations between a lexeme and its lexical correlates in a problem of synonymy's situations's recognition is represented. The synonymy's situations are described on the basis of standard Lexical Functions. In this paper a principles of the Word's Lexical Meaning's theory's independent descriptions's generalization are represented.

Introduction

Application of the device of standard Lexical

Functions (LF) within the frameworks of the

“Meaning  Text” approach can solve a problem of the Natural Language's (NL) statements's synonymy's proof on the basis of final set Rls of correctly formalizable rules of transformations of Deep Syntactic Structures

(DSS) [4, 5]. Nevertheless, significant difficulty at realization of Rls is a formalization of conditions of rules's applicability. For

 rl

Rls the condition r

 

is a set of requirements to syntactic and semantic properties of the lexical blocks replaced by rl .

Let's consider a problem of the statements's

LF-synonymy's proof as a classical problem of

Pattern Recognition (PR). All set L of NLstatements's pairs, between which the LFsynonymy's establishment is possible

(concerning Rls ), there is an initial set of classified objects. Then statements's pairs's l

L LF-synonymy's demonstrability concerning the fixed rl

Rls will be the basis for grouping them into one taxon. Thus r

  represents itself as precedent as a typical representative of the taxon rl . The

_______________________________________________________________________ formulation of PR’s problem: the new pair l

L , which not participated in taxonomy is shown. It is required to analyse rl

 

and to recognize a class's pattern rl

Rls , to which an object l is the most similar. The problem statement : to develop a program-realizable representation of

 

by means of revealing a character of semantic correlations between a lexeme and its lexical correlates for the basic types of Lexical Functions.

By virtue that sense's redistribution actual for the formalization of

 

is characteristic for situations with the parametric LFs, the PR's problem given above should be formulated as recognition of the semantic relation which is set by splintered value. Revealing and generalization of the given relation has direct analogy to the description of Noun Phrases's semantics [1]. Thus for the Lexical Meanings

(LM) of words replaced by rl are under construction the formalized descriptions in a kind of theories - a sets of meaning postulates, connecting each of replaced words with in other words and concepts. Nevertheless, at independent construction of the theory of one word by different researchers there is a problem of generalization of knowledge received thus. The given problem is especially

1 This work is financially supported by RFBR (project №06-01-00028) and by ESIC of NovSU.

actual at construction of theories on the basis of LMs's NL-definitions with application of standard conceptual languages [2].

Decision methods

Let for

Lec i j  l j

, l j

L we have the description of its Lexical Meaning's theory by means of compound object of Prolog language: lmth

Lec _ j _ i , Var _ Smth , Re l _ list

(1) which describes a set of binary relations l between concepts Cncpt 1 and Cncpt 2 : rel 2

Re l , Cncpt 1 , Cncpt 2

, and (2) recursively defined relations of arbitrary arity : rel 2 _ complex

Re rel _ l complex

, Cncpt

Re l ,

, Re

Re l l

_

_ list list

and (3)

(4) by means of a list Re l _ list of structures of a kind (2), (3) and (4)).

LM given by means of (1) is a denotation to which in logic is put in conformity an extension [3] as a class of entities, defined by

(1). Inasmuch as for

Lec i j

its sense (or intension, [3]) from the philosophical point of view is a network of relations between i j

Lec and other words Lec m k

: Lec j  i k

Lec m

, the sense of a lexeme can be defined by a set of functions which are set by statements of a kind

(2), (3) and (4) in structure of theories. These functions characterize the concepts designated by Lec i j

. Following the terminology accepted in [3], we shall name such functions by

Characteristic Functions (ChF) for a set of

Lexical Meanings. Thus, as shown by us in

[1], each of them can be set both by a separate statement, and their group.

At use of a structure (1) for the description of the theory of LM Lec _ j _ i a value of each of the specified functions will be equal to the third argument Cncpt 2 _ mng _ fn of the

254 relation in some statement of a kind (2). And

Cncpt 2 _ mng _ fn should be a designation of concept known to system (this concept is identified with the Semantic Class (SCl) of some word). To a name of ChF there will correspond the first argument Re l _ name _ fn of the first statement of a kind (2) or (3), being a designation of a known SCl (this SCl should define a relational noun), at back viewing the list Re l _ list of statements of a kind (2) for the given Lec i j

(here as the Re l _ list there can be a list the third argument of the statement of a kind (3), containing the

Re l _ name _ fn by the first argument) from the statement with the Cncpt 2 _ mng _ fn mentioned above as the third argument

(formed at such viewing the Re l _ list the list in the further reasonings we shall designate as

Re l _ list _ fn , Re l _ list _ fn

Re l _ list ).

On a place of the second argument of the statement with Re l _ name _ fn necessarily there should be a variable Var _ Smth designating a word, interpreted by means of

(1). Each next statement in list Re l _ list _ fn should is obligatory to have at least one common argument, which is a designation of some variable, with the previous statement.

According to the definition of sense formulated in [3] as intension, externally various descriptions (1) of theories of the same

LM give a common set of ChFs mentioned above. Finally they define an intension for the generalized theory of considered LM.

Proceeding from definition of intension as a function from the possible worlds to extensions [3], and also the recursive nature of meaning postulates, let's set the task of construction of the generalized theory of the given LM on the basis of independently received variants of theories of this LM as restoration of syntactic representation [3] of extension on the basis of known syntax of

expressions for the ChFs which are making an intension and written down by set of statements of a kind (2), (3) and (4). We have a ternary relation I

G

M

W between :

 a set of objects G which correspond to variants

Lec i j lmth i oj

of definition of the LM

in the form of (1), G

 lmth i oj

;

 a set of attributes M which correspond to values Cncppt 2 _ mng

Characteristic Functions for

_

 fn i poj lmth i oj

; of

 a set W of attribute values. In our task each w

W is a name Re l _ name _ fn i poj

of ChF which value belongs to the M .

A relation I can be considered as a binary relation

I 1

  lmth i oj

, Cncpt 2 _ mng _ fn i poj

 

W .

According to the Basic Theorem of Formal

Concept Analysis (FCA) [4] proved by G.

Birkhoff that for any binary relation it is possible to construct a “complete’ lattice appears an opportunity to apply the mathematical device of FCA to our problem.

With the respect of a complex character of postulates of a kind (3) and (4) we shall expand a set M of formal attributes by first arguments Re l _ from _ Arg i qpoj of statements of a kind (2), being an elements of

Re l _ list _ fn i poj

for the given variant i oj lmth of the theory lmth i j

(let's designate the resulted set as M 1 , and the extended thus a set of Formal Concepts (FC) – as 1 ). As well as

Cncpt 2 _ mng _ fn i poj

,

Re l _ from _ Arg i qpoj should be a designation of Semantic Class known to system. Besides, as a rule, with

Re l _ from _ Arg i qpoj

, associate some relation set by noun which names a

Re l _ from _ Arg i qpoj

. Thus

Re l _ from _ Arg i qpoj

Cncpt 2 _ mng _ fn i poj

, will (actually) characterize the FC set by pair

Re l _ name _

 i oj lmth , fn i poj



. A value of attribute Re l _ from _ Arg i qpoj

will be equal to third argument Cncpt 2 _ mng _ fn i qpoj of the first statement of a kind (2) in the list

Re l _ list _ fn i poj (at direct viewing of this list),

Cncpt 2 _ mng _ fn i qpoj should be a designation of Semantic Class known to system. Search of such statement and formation of the

255 corresponding sublist of list Re l _ list _ fn is carried out by analogy to formation directly

Re l _ list _ fn .

By introduction in a consideration of a multi valued context :

K

G 1 , M 1 , W , I

(5) on the set G determine a relation known in the theory of the FCA as a ‘subconceptsuperconcept” [4] relation. Besides for any subset of objects from G the Least Common

Superconcept (LCS) and Greatest Common

Subconcept (GCS) can be set. Thus a set of the objects connected by “subconceptsuperconcept” relation with one GCS and/or with one LCS, it is necessary to consider as area. There in a role of LCS and of GCS can be, accordingly, the top concept and the bottom concept of lattice [4]. In this paper for areas we put forward the requirement of uniqueness both GCS, and LCS. A context (5) can be visually represented (fig. 1) by application of the specialized Software

ToscanaJ (http://toscanaj.sourceforge.net) which realize a methods of FCA.

Fig. 1. LM's definitions for Russian word “агрессор”

Using a definition introduced above for an area of a lattice with reference to elements of

LM's definition of the given Lec i j

let's define formally a key rule for generalization of statements of theories (1). rel rel

Two compared statements of a kind

2

2

_

_ complex complex

Re

Re l l

,

, Cncpt ,

Cncpt ,

Re

Re l l _

_ list list

1

2

 and with coincident first and second arguments will be in a resulted theory a one statement of a kind

(3) with a third argument which includes the statement (4) what unites the statements from

256 lists Re l _ list 1 and Re l _ list 2 by “or” relation at fulfilment of a following condition.

A sets of the FCs got on the basis of

Re l _ list 1 and Re l _ list 2 , should form in a lattice for (5) an areas with LCS which has fig.1 an “or” relation will correspond to the following pairs of FCs :

(“Definition2_of_aggressor”,

“Definition3_of_aggressor”) ;

“Definition1_of_aggressor” and LCS for the pair

(“Definition2_of_aggressor”,

“Definition3_of_aggressor”) . rel rel

Two compared statements of a kind

2

2

_

_ complex complex

Re

Re l l

,

, Cncpt ,

Cncpt ,

Re

Re l l _

_ list list

1

2

 and

will be in a resulted theory a one statement of a kind

(3) with a third argument which includes the statement (4) what unites the statements from lists Re l _ list 1 and Re l _ list 2 by “and” relation at fulfilment of a following condition.

Statements of the lists Re l _ list 1 and

Re l _ list 2 describe the same FC of (5) but by means of different ChFs. In an example in a fig.1 a told is related to an intent (as a set of formal attributes, [2]) of the FC

“Definition1_of_aggressor” and to an intent of the

LSC for the pair (“Definition2_of_aggressor”,

“Definition3_of_aggressor”) .

The stated principles of generalization of statements of a kind (3) are applicable for statements of any complexity from among entering into the third argument of statements

(3) and recursively defined on the basis of (3) and (4). Thus whereas a capacity n of a set of

ChFs corresponding required extension, does not depend on quantity k of generalized theories, a computing complexity of generalization's process of the given LM's theories depends exclusively from n and amounts O n k

 k

(at worst n it is equal to quantity of statements of a kind (2) and (3) at all levels of the description of LM by means of

(1)). As and O

 n k k

1 ,  , n

, O k n k k

1 under k

 n .

 n under k

1

Experimental approbation

The offered technique of generalization of theories (1) has been approved in Visual

Fig.2. The generalized theory of LM for Russian word “агрессор”

Prolog 5.2 environment on a material of independent lexicographic definitions for the

LM of Russian word “агрессор”. Variants of definitions are taken from the Big Soviet

Encyclopedia and the thematic dictionary

“War and peace” on http://slovari.yandex.ru and also in [4]. The generalized theory of LM for “агрессор” is represented in a fig.2.

The perspectives of further researches are related with sharing the approach offered in

the present paper and methods of generalization of predicates on the basis of the truth's sets [1].

References

1.

Mikhailov D.V., Emelyanov G.M. Model of language's sorts's system in a problem of a statement's semantic pattern's construction at a level of deep syntax // Taurian Herald for Computer

Science and Mathematics. - 2006. - №1. - P.79-90

(in Russian).

2.

Emelyanov G.M., Kornyshov A.N., Mikhailov D.V.

Conceptually-situational modeling of process of synonymic transformation of the Natural Language

257 statements as machine learning on the basis of precedents // Scientific-theoretical magazine

“Artificial intelligence”. - 2006. - №2. - P.72-75 (in

Russian).

3.

Gerasimova Irena. A. Formal grammar and intensional logic // Moscow : Russian Academy of

Science, Institute of Philosophy, 2000 (in Russian)

4.

Ganter B. and Wille R. Formal Concept Analysis -

Mathematical Foundations // Berlin: Springer-

Verlag, 1999.

5.

Igor A. Mel'cuk, Alexander K. Zholkovsky.

Explanatory Combinatorial Dictionary of Modern

Russian. Semantico-Syntactic Studies of Russian

Vocabulary // Wiener Slawistischer Almanach,

Sonderband 14, Wienna 1984.

Download