14 pt type size, 10 words max, uppercase, bold, centered

advertisement
Semantic Space Creation and Associative Search Methods
for a Document Database of International Relations
*
Shiori Sasaki, **Yasushi Kiyoki, ***Taizo Yakushiji
*
Graduate School of Media and Governance, Keio University
**
Faculty of Environmental Information, Keio Univeristy
***
Faculty of Law, Keio University
*, **
5322, Endo, Fujisawa, Kanagawa, JAPAN
***
2-15-45 Mita, Minato-ku, Tokyo, 108-8345 Japan
E-mail: s-sas@jcom.home.ne.jp, Kiyoki@mdbl.sfc.keio.ac.jp, yakushi@iips.org
Abstract
In this paper, we present a new creation method of a
semantic retrieval space for the field of International
Relations (IR). This method creates an integrated
metadata space for computing semantic relationships
between words in an IR lexicon and in a general
dictionary. The created semantic space is applied to the
mathematical model of meaning which has already been
proposed. This model makes it possible to compute
semantic relationships between words dynamically
according to a given context. Using the semantic space
made by this method, we can search the document which
consists of IR terms by using general words, and search
for the one which consists of general words by using IR
term.
Key Words
semantic associative search, international relations,
document database
1. Introduction
A large number of information resources are distributed
in the world-wide network environment. Those resources
include a lot of documents which relate to international
relations and world politics, from official announcements
of the governments or the international organizations,
policy statements, parliamentary papers, press briefings,
activity reports of NGO to announcements in the form of
informal talks of politicians. In this environment, one of
the most important issues for researchers of international
relations or world politics is how to extract appropriate
information according to their concerns and viewpoints.
However, it is difficult to obtain the documents
accompanied by the interpretation of the meaning of the
word and data because general search engines including
the category retrieval in WWW adopt a simple patternmatching method. And it is also difficult for researchers
of international relations to analyze the semantic content
in documents multilaterally and dynamically because
existing methods in the study field such as the Content
Analysis [1][2] and the Cognitive Map [3][4] aim at
knowledge discovery mainly for the static character of
documents.
The Content Analysis and the Cognitive Map were
introduced into the study field of international relations
and world politics in 60's-70’s, and have been applied to
the analyses of the cognition, attitude or images to the
another country of the policy makers through the
published documents. Especially, newspapers and party
organs have been used as object of the analysis to grasp
the trend of public opinion and the policy of political
parties. And the speeches, statements, exchange
documents and letters by the heads of each country have
been used for the analyses of their cognition because of
the difficulty of getting interview directly with them.
The Content Analysis is the method which measures
the appearance frequency of a word or an encoded
sentence in document groups, and calculates the
correlation level of each word, code and document.
Sometimes it accompanies the multivariate analysis.
On the other hand, the Cognitive Map is a method for
analyzing logical routes of the cognition of the policy
makers. It considers a logical structure of the document to
be a concept-network in the mind of the author or the
speaker, shows the causal relations between each concept
by “-“, “+”and “0”, and calculates the logical main rout
and the highlight concept.
However, these methods have weaknesses that the
Content Analysis cannot measure the semantic relations
between words included in document groups, and the
Cognitive Map cannot measure the semantic relations
between documents. In the Content Analysis, to treat a
large amount of document data in a semiautomatic way, it
is often adopted to measure the appearance frequency of
words by mechanical pattern-matching. Here the problem
is that the contextual meaning of a word in the document
is not reflected in the word appearance frequency. For
instance, whether the word “engagement" in the document
is used to mean the economical "contract" or the military
"involvement" does not apparent in the result of
measurement. Or, it is difficult to read whether the single
word “development" means “economic development" or
"development of weapon" from the result. On the other
hand, showing the relations between concepts, the
Cognitive Map is not suitable for the comparison analysis
between a large amount of documents because it is
necessary to make a map per document and cannot show
the strength of the relation quantitatively.
Compared to these methods, the Semantic Associative
Search Method based on a mathematical model of
meaning which has already proposed in articles [5][6][7]
makes it possible to extract and obtain significant
information from multiple document data with a
machinery. In this method, the acquisition of information
is performed by semantic computations so that users can
search the document dynamically according to their own
contexts and points of view. Therefore we thought it
could be possible for researchers to treat and analyze a
large amount of documents in a form of original data
without encoding by applying the Semantic Associative
Search method to analysis of documents of international
relations.
In this text, to implement the environment which
applies the Semantic Associative Search method to the
retrieval and the analysis of the document of international
relations, we show the creation method of the Semantic
Retrieval Space based on the mathematic model of
meaning.
There are chiefly two features of this method. First, this
method realize a semantic space for Semantic Associative
Search which can be applied to the document analysis in
international relations. Second, this method realizes the
mechanism which measures semantic relation between the
technical terms and the general words, by integrating the
space constructed from source A (lexicon) and the space
from source B (dictionary). It can be said that the former
has a methodological value in the study of international
relations, the latter has a value for database engineering.
2. Outline of the Semantic Associative Search
method
In this section, we review the outline of the Semantic
Associative Search method based on the mathematical
model of meaning. This model has been presented in
[5][6][7]] in detail. Our creation method described in this
paper is realized by this Semantic Associative Search
method.
features. The m basic data items is given in the form of an
m by n matrix M. For given m basic data items, each data
item is characterized by n features. By using this matrix
M, the orthogonal space is computed as the metadata
space MDS.
2.2 Representation of information resources in ndimensional vectors:
Each of the information resources is represented in the
n-dimensional vector whose elements correspond to n
features used in 2.1. These vectors are used as “metadata
for information resources”. The information resources
become the candidates for the semantic associate search
in this model. Furthermore, each of context words, which
are used to represent the user's impression and data
contents in semantic information retrieval, is also
represented in the n-dimensional vector. These vectors are
used as “metadata for contexts.”
First we construct the correlation matrix with respect to
the features. Then we execute the eigenvalue
decomposition of the correlation matrix and normalize the
eigenvectors. We define the image space MDS as the span
of the eigenvectors which correspond to nonzero
eigenvalues. We call such eigenvectors semantic elements
hereafter. We note that since the correlation matrix is
symmetric, the semantic elements form orthonormal bases
for MDS. The dimension v of the image space MDS is
identical to the rank of the data matrix A. Since MDS is v
dimensional Eucledian space, various norms can be
defined and a metric is naturally introduced.
2.3 Mapping information resources into the metadata
space MDS:
Metadata items (data-items for space creation, metadata
for information resources and metadata for context words)
which are represented in n-dimensional vectors are
mapped into the orthogonal metadata space.
We consider the set of all the projections from the
image spaceMDS: to the invariant subspaces (eigen
spaces). We refer to the projection as the semantic
projection and the corresponding projected space as the
semantic subspace. Since the number of i dimensional
invariant subspaces is (v (v-1)…(v – i + 1))/i !, the total
number of the semantic projections is 2v. That is, this
model can express 2v different phases of the meaning.
2.4 Semantic associative search:
2.1 Creation of metadata space:
To provide the function of semantic associative search,
basic information on m data items ("data-items for space
creation") is given in the form of a matrix. Each data item
is provided as fragmentary metadata which is
independently represented one another. No relationship
between data items is needed to be described. The
information of each data item is represented by its
When a sequence of context words which determine the
user's impression and data contents are given, the mostly
related information resource to the given context is
extracted from a set of metadata items for information
resources in the metadata space.
Suppose a sequence sl of l words (context words)
which determines the context is given. We construct an
operator Sp to determine the semantic projection
according to the context. Context words are given as a
sequence of several keywords which are defined with ndimensional vectors to specify the query for information
retrieval.
We call the operator a semantic operator.
(a) First we map the l context words in databases to the
image space MDS. This mathematically means that
we execute the Fourier expansion of the sequence sl
in MDS and seek the Fourier coefficients of the
words with respect to the semantic elements. This
corresponds to seeking the correlation between each
context word of sl and each semantic element.
(b) Then we sum up the values of the Fourier coefficients
for each semantic element. This corresponds to
finding the correlation between the sequence sl and
each semantic element. Since we have v semantic
elements, we can constitute a v dimensional vector.
We call the vector normalized in the infinity norm the
semantic center of the sequence sl.
(c) If the sum obtained in (b) for a semantic element is
greater than a given threshold ε, we employ the
semantic element to form the projected semantic
subspace. We define the semantic projection by the
sum of such projections.
This operator automatically selects the semantic subspace
which is highly correlated with the sequence sl of the l
context words which determines the context.
This model makes dynamic semantic interpretation
possible. We emphasize here that, in our model, the
”meaning” is the selection of the semantic subspace,
namely, the selection of the semantic projection and the
“interpretation” is the best approximation in the selected
subspace.
3. Outline of the creation method of a
Semantic Retrieval Space for IR documents
This method aims to create a semantic retrieval space
which intended for the document data group containing
the technical terms of international relations (hereinafter
called “IR”). Using the semantic space made by this
method, we can search the document which consists of IR
terms by using general words as query, and search for the
one which consists of general words by using IR technical
terms.
In addition, this method requires a lexicon which
explains the technical terms of this field, a dictionary
which explains general words and a document database
concerning IR as a retrieval target. In other words, this
space creation method can be applied also to other
specific fields if only a lexicon of the field and a general
dictionary exist.
The schema of the space is represented as an ordered
set of “basic words” and an ordered set of “feature words”
in a form of matrix. The ordered set of basic words is the
vertical elements in the matrix, and the ordered set of
feature words is the horizontal elements in the matrix.
And we define an IR technical term of basic words as “IR
basic term(wIR)”, and a general word of basic words as
“general basic word(wG).” Also we define a feature word
which represents the relation to basic words as "related
feature word (fr)", and a feature word which represents the
definition of a basic word as "defining feature word (fd)".
Related feature words Defining feature words
fr-1fr-2 …
fr-m fd-1fd-2
… fd-n
Technical wIR-1
basic terms wIR-2
Step2:
Defining technical terms
by general words
IR basic matrix
…
IR-M r
IR-M d
wIR-k
General
wG-1
basic wordswG-2
general words matrix
…
G-M r
G-M d
reference
definition
wG-l
Step 1: Relating general words to technical terms
Figure 1: Structure of the space creation
Figure 1 shows the structure of the space creation
method. IR-Mr is a metadata matrix which shows the
relations between IR terms. For given k IR basic terms
(wIR-1, wIR-2, …, wIR-k), each term is characterized by m
related feature words of IR field (fr-1, fr-2, …, fr-m). G-Md is
a metadata matrix which shows the definition of general
basic words. For given l general basic words, each word
is characterized by n defining feature words(fd -1, fd -2, …
, fd-n). To create an integrated matrix IR/G-Mrd, part GMr and part IR-Md are added to IR-Mr and G-Md . That is
part IR-Md is a partial matrix which shows the relations
between IR basic terms and defining feature words. Part
G-Mr is a partial matrix which shows the relations
between general basic terms and related feature words of
IR terms.
3.1 Creation of an IR basic matrix IR-Mr
To create matrix IR-Mr, a set of feature words enough
to express IR field is needed. With a lexicon of IR,
technical terms which appear in the explanation of the
every particular item are extracted as the ordered set of
related feature words. Next, every particular item is
extracted from the lexicon as the ordered set of IR basic
terms. Then, each IR basic term is characterized by the
related feature words. “1” is set to a related feature word
which appears as a positive sense in the explanation, “-1”
is set to the related feature word which appears in a
negative sense, and “0” is set to a related feature word
which does not appear in the explanation. Through this
process, the IR basic matrix IR-Mr, which shows the
relation between IR basic terms and the related feature
words is created.
3.2 Integration of the IR basic matrix and the general
words matrix
To compound matrix IR-Mr and matrix G-Md, we create
partial G-Mr and partial IR-Md.
Step 1: Relating the general words to the technical terms
For the creation of part G-Mr, l general basic
words(wG-1, w G-2, …, w G-l) are characterized by m
related feature words of IR field (fr-1, fr-2, …, fr-m).
Step2: Defining the technical terms by the general words
For the creation of part IR-Md, k IR basic terms (wIR-1,
wIR-2, …, wIR-k)are characterized by n defining feature
words(fd -1, fd -2, …, fd-n).
Step 3: Adding other words to the vertical elements
The words which exist in neither matrix IR-Mr nor
matrix G-Md but appear frequently in the document
groups are added to vertical elements as basic words,
and characterized by the defining feature words and
the related feature words of IR.
By these processes, the integrated matrix IR/G-Mrd is
created.
4. Realization of Semantic Space for
International Relations
4.1 Creation of an IR basic matrix
As an example of realization of a semantic space by the
method shown in 3.1, we referred Dictionary of
International Relations [8] (hereinafter called "IR-Dic.") ,
which is widely used in the study of IR. This lexicon
explains 716 technical terms by their definitions, sources,
history, and relevance with other terms. Every 716 term
of items was extracted as IR basic term (wIR). From the
explanatory note of each item, only the related terms were
extracted as related feature words (fr). The values were
determined by the way shown in 3.1 Through this
process, the IR basic matrix was created. For example, for
the term "arms control,” the value 1 is set to the related
feature words such as "capability", "actor", "crisis
management", "deterrence", "disarmament", "Cold War",
"superpower", "non-proliferation", "ABC weapons" and
"security regime." This IR basic matrix expresses the
relevance between the terms in IR-Dic.
It became the 712 x 712 matrix consisting of 712 basic
words and 712 feature words. Then, the created space
based on this matrix consists of 710 dimensional vectors.
We call this space IR space.
4.2 Integration of the IR basic matrix and the general
words matrix
Matrix which is created by using a general dictionary is
compounded to the IR basic matrix. We referred
Longman Dictionary of Contemporary English [9]
(hereinafter called "Longman-Dic."), which explained
about 56000 general words by about 2000 basic words.
We selected 2115 basic words both as the general basic
words (wG) and the defining feature word (fd). Then, the
2115 x 2115 matrix is created, which represent the
definitions of general words in Longman-Dic.
Step 1: Relating the general words to the technical terms
Part G-Mr is created by the process shown in 3.2. Step
1. As an example, the general basic word "arms" is
characterized by the IR related feature words such as
"arms control", "arms race" and "arms sales". This
characterization is checked by the specialist of IR field
according to their knowledge.
Step 2: Defining the technical terms by the general words
Part IR-Md is created by the process shown in 3.2. Step
2. For characterization, we extracted the verb and the
noun from the terminological definition of the
explanatory note of IR-Dic.. As an example, the basic
IR term "arms control" is characterized by the defining
feature words such as "arms", "control", "reduce",
"remove", "weapon", "threat" and "force." When there
was a word which was not in the defining feature
words in IR-Dic., we looked up the word in LongmanDic. and extracted the verb and the noun from the
explanation.
Step3: Adding other basic words to the vertical elements
The important words such as "democracy",
"economy", and "policy", which exist in neither IR
basic terms nor general basic words but frequently
appear in documents were added to the vertical
elements and characterized by the related feature
words and defining feature words. We used LongmanDic. and IR-Dic. for this process.
As a result, the new integrated matrix which has about
2000+712 basic words in the vertical elements and 2861
feature words in the horizontal elements was created. The
created space based on the matrix consists of 2846
dimensional vectors. We call this space the integrated
space.
5. Experiment
To verify the feasibility and effectiveness of the
integrated space, we performed several experiments.
Experiment 1: Comparison of correlation between each
word in the IR space and the integrated space
Experiment 2: Application experiment of document
retrieval in the integrated space
5.1 Experiment 1
5.1.1 Evaluation method
Experiment1-1: We selected an IR technical term “arms
control" as a keyword of query, and retrieved correlated
words in the IR space created by the process of 4.1, and in
the integrated space created by the process of 4.2. Then,
we compared the top 30 retrieval results in both spaces.
The result is shown in Table 1.
Experiment1-2: First, we selected 15 IR technical terms
such as “weapons of mass destruction”, “economic
liberalism”, “non-tariff barriers” as keywords from each
Table 1: Comparison of the retrieval result in the IR space
and the integrated space
IR space
rank
Integrated space
retrieved w ord
correlation
1 NPT
2 arm s control
3 nuclear proliferation
4 inspection
5 non-proliferation
6 proliferation
7 nuclear w eapons
8 preventive w ar
9 second strike
10 non offensive defence
11 M A D
12 deterrence
13 C TB T
14 C uban m issile crisis
15 accidentalw ar
16 lim ited nuclear w ar
17 parity
18 force
19 realism
20 verification
~
~
29 tacticalnuclear w eapons
30 chem icaland biologicalw ar
0.422383
0.386465
0.367211
0.354512
0.345976
0.344098
0.32252
0.315053
0.30641
0.305555
0.30432
0.303463
0.303237
0.29737
0.290768
0.288547
0.287821
0.283384
0.279878
0.277056
~
0.261701
0.260292
retrieved w ord
w eapon
NPT
nuclear w eapons
nuclear proliferation
IN F treaty
non-proliferation
tacticalnuclear w eapons
S T A R T II
C TB T
C uban m issile crisis
proliferation
STA R T I
S A LT
w eapons of m ass destruction
cruise m issile
second strike
m assive retaliation
arm s control
m issile
deterrence
~
flexible response
horizontalproliferation
correlation
0.597043
0.411523
0.401119
0.370773
0.35418
0.346241
0.332596
0.331209
0.321286
0.312302
0.308877
0.305525
0.297272
0.294429
0.294311
0.285025
0.27672
0.268601
0.268427
0.266044
~
0.242601
0.23927
issue area of IR, security, political economy, international
organization, human rights, global environmental
problems, theoretical perspective and concept. Second,
for these keywords, we retrieved the words in the IR
space similarly in Experiment 1-1, and fixed the top 10
words as correct answers. Then, we measured the ratio of
the correct answers ranked in the top 10 and top 20 in the
integrated spaces. The result is shown in Figure 2.
Correct rate in the top 10
Correct rate in the top 20
120%
100%
80%
60%
40%
20%
w
ea
po
ns
o
f m ar
a m
ec ss d s co
on es nt
o t ro
no mic ruct l
n- li io
ta be n
rif ra
f b l is
ar m
ri e
rs
hu
m
W
an
TO
ita
ri a
EU
n
i
et nte
U
ec hni rven N
ol c c ti
og le on
y an
ec si
op ng
ol
iti
cs
ne IN
or GO
ea
li
po sm
la
r
re i t y
gi
m
pa e
rit
y
0%
Figure 2: The rate of the retrieval correctness
in the integrated space
Experiment1-3: We selected the general words "trade",
"environment" and "human" as keywords of queries and
checked each result of the top 10 in the integrated space.
The result is shown in Table 2.
Table 2: The retrieval result in the integrated space
keyword:
trade
rank
retrieved word
1 GATT
2 free trade area
3 protectionism
4 tariff
5 Tokyo round
6 trade
7 free trade
8 common market
9 quota
10 Kennedy round
correlation
0.46484
0.458932
0.447085
0.43101
0.407401
0.389198
0.380862
0.379662
0.363634
0.359655
environment
retrieved word
correlation
organization
0.447723
environment
0.445309
pollution
0.403463
green movements
0.347933
ecology/ecopolitics 0.308913
INGO
0.30539
north/south
0.302781
globalization
0.255842
NATO
0.244127
Earth
0.239894
human
retrieved word
correlation
human
0.501031
ethnic cleansing
0.4802
genocide
0.430277
Genocide Convention 0.415171
immigration
0.367203
international law
0.35866
nationalism
0.352227
nation
0.329126
ethnic nationalism
0.328147
balkanization
0.325926
5.1.2 Experimental results
Experiment1-1: The experimental results show that the
integrated space realizes high quality retrieval for IR
terms. For query “arms control,” the words such as
"weapon", "INF treaty", "START II", "START I",
"weapons of mass destruction", and "cruise missile" are
included in the top 30 in the integrated space, which are
closer to “arms control” in semantic relation but not
included in top 30 in the IR space. At the same time, the
words such as "tactical nuclear weapons" and "CTBT" ,
which are closer to “arms control” in semantics, are
selected in the high ranking in the integrated space.
For example, the IR basic term "INF treaty" is
characterized only by the related feature words such as
"nuclear weapons”, “arms control”, “START I”, “START
II”, “nineteen-eighty-nine”, “Warsaw Pact”, “bloc”,
“constructive engagement”, “perception”, “Gorbachev
doctrine” and “Cold War” in the IR basic matrix, but also
characterized by the defining feature words like "arms”,
“weapon”, “missile”, “treaty”, “agreement”, “remove”
and zero" which are the common feature of “arms
control” in the integrated matrix. That is why “INF treaty”
was reasonably ranked higher in the result of the
integrated space than in the IR space.
Experiment1-2: For more than a half of queries, the rate
of the correct answers included in the top 10 retrieval
results in the integrated space was 70%. Furthermore, for
all queries, the correct answers were included in the top
20 at the high rate of 80% to 100%.
As an example, for the query ”economic liberalism",
“protectionism”, “GATT”, “free trade”, “common
market”, “economic liberalism”, “tariff”, “free trade
area”, “Tokyo Round”, “WTO” and “quota" ranked as the
top 10 in the IR space whereas in the integrated space,
"Kennedy Round" and "non-tariff barrier" ranked instead
of "common market" and "WTO." This means that the
integrated space keeps the retrieval quality without
breaking the fundamental structure of the IR space.
Experiment1-3: In the integrated space, for all the
queries that use the general words as keywords, the IR
technical term related to them ranked in the top 10 in the
integrated space. This means that it is possible to retrieve
a document which consists of IR technical term by using
general words as keywords.
As an example, for the query of the general word
"environment", the IR term such as "pollution", "green
movements", "ecology", "INGO" and "globalization"
ranked in the top 10 in the integrated space. This is
because these IR terms are characterized by the defining
feature word such as "air”, “water”, “green”, “people”,
“plant” and “Earth" which are common to the feature of
“environment” in the integrated matrix.
Moreover, for each query, the general words reflecting
the common knowledge of IR field has selected in highranking. This shows that the characterization of the
general words by technical terms is done appropriately.
5.1.3 Analysis
From these result, it was verified that the
characterization including both defining and relating is
made more appropriately in the integrated space than in
the IR space. It was also verified that not only the IR
terms but also the general words which reflect the
knowledge of IR field could be retrieved by both general
words and IR technical terms.
keyword set of both IR terms and general words (upperright cell: III and middle-right cell: VI). And the case in
which the document only with IR-term-metadata were
retrieved by the keyword of general word (upper-middle:
II) and the case in which the document only with generalword-metadata were retrieved by the keyword of IR word
(middle-left: IV), the relevance ratio was not so high but
at least 5 documents ranked in the top 10.
5.2 Experiment 2
It experiment 2, we retrieve documents in the
integrated space created through the process of 4.2.
5.2.1 Evaluation method
We collected 40 documents concerning IR from WWW
as retrieval candidate and prepared three kinds of
metadata set for documents; 1) only IR technical term, 2)
only general terms, 3) both IR terms and general terms.
As an example, metadata sets of pattern 3) are shown in
Figure 3. We also classified type of keyword for queries
into three kinds; 1) only by IR term, 2) only by general
term, 3) by both IR term and general term. Moreover,
three kinds of reference words were similarly prepared
about the reference word. 3x3 kinds of combination for
the experiment is shown in Table. 3.
ID
doc1
doc2
~
doc15
doc16
~
doc39
doc40
matadata of document
trade system, free trade, tariff, regime, import, product, trade1, standard1, success…
aid, north-south, LDCs, forth-world, poor, people1, aids, die1…
~
human rights, war, intervention, communal conflict, prisoner, race, soldier, rights, attack ...
epistemic communities, futurology, ...global governance, future, government, theory, idea…
~
water, natural resources, world-politics, environment , stop, dirty, air, water, protect, earth…
armistice, war, conflict, demilitalization, intelligenge, stop, attack, fight, information …
Figure 3: Examples of metadata
Table 3: Combination of metadata and keyword for
queries
Keywords for query
IR terms
Metadata
of document
General words
IR terms +
general words
General words
Ⅰ
Ⅳ
Ⅱ
Ⅴ
Ⅲ
Ⅵ
IR terms +
general words
Ⅶ
Ⅷ
Ⅸ
IR terms
Next, we selected the IR term “conflict”, “crisis” and the
general word “attack”, “crash” as keywords for queries
and fixed eight documents about the security or conflict
as correct answers in advance. Then, we put them ID as
doc5, doc10, doc15, doc20, doc25, doc30, doc35 and
doc40. The retrieval result according to nine kinds of
combination is shown in Table 4.
5.2.2 Experimental results
The case in which the document with metadata of both
IR terms and general words were retrieved by the
keywords of both IR term and general word shows the
best retrieval quality (lower-right cell: IX). Even if the
document were given only IR-terms-metadata or the
document were given only general-word-metadata, they
were marked relatively high relevance ratio by the
metadata:
IR terms
metadata:
General words
metadata:
IR terms
+
General words
keyword:
keyword:
keyword:
IR terms
ID
correlation
doc40
0.502226
doc5
0.492371
doc15
0.431048
doc20
0.337915
doc25
0.268766
doc35
0.246094
doc11
0.241361
doc2
0.233853
doc30
0.231785
doc33
0.220733
General words
ID
correlation
doc5
0.501295
doc40
0.392284
doc15
0.355318
doc20
0.284476
doc37
0.202732
doc11
0.201073
doc9
0.199328
doc17
0.199207
doc23
0.195791
doc25
0.193684
IR terms + General words
ID
correlation
doc5
0.544352
doc40
0.535234
doc15
0.45911
doc20
0.343914
doc25
0.256491
doc11
0.242029
doc35
0.236115
doc2
0.235324
doc30
0.233203
doc33
0.199303
1
2
3
4
5
6
7
8
9
10
doc40
doc30
doc10
doc20
doc24
doc35
doc5
doc26
doc32
doc11
0.510731
0.424205
0.259057
0.24301
0.222061
0.198955
0.188389
0.181802
0.181352
0.179863
doc10
doc20
doc40
doc30
doc14
doc5
doc8
doc25
doc23
doc15
0.512244
0.446176
0.433106
0.364944
0.284089
0.277443
0.26781
0.264337
0.249578
0.24773
doc40
doc30
doc10
doc20
doc24
doc35
doc5
doc25
doc26
doc6
0.552874
0.45841
0.32358
0.295688
0.218491
0.204756
0.199559
0.178692
0.169355
0.16754
1
2
3
4
5
6
7
8
9
10
doc5
doc40
doc15
doc30
doc20
doc25
doc35
doc33
doc38
doc10
0.472904
0.468055
0.390095
0.342379
0.311306
0.264962
0.255268
0.222828
0.219296
0.21916
doc5
doc20
doc40
doc15
doc10
doc30
doc14
doc17
doc23
doc8
0.493925
0.394756
0.378979
0.340018
0.322031
0.299353
0.271628
0.254037
0.245184
0.236924
doc5
doc40
doc15
doc30
doc20
doc25
doc35
doc10
doc2
doc24
0.52478
0.498384
0.416298
0.3652
0.341837
0.258615
0.250516
0.23775
0.209467
0.207355
rank
1
2
3
4
5
6
7
8
9
10
5.2.3 Analysis
From these results, it is verified that the document which
consists of IR terms can be retrieved by using general
words, and the one which consists of general words can
be searched by using IR term in the integrated space.
6. Conclusion
In this paper, we showed the creation method of
semantic retrieval space concerning the documents of
international relations. By this method, a new integrated
space can be created from the matrix constructed from a
general dictionary and the matrix constructed from a
lexicon of specific study field. Applied to the semantic
retrieval search, this creation method of space was
verified its feasibility and accuracy. Using this semantic
retrieval space, you can retrieve and analyze documents
dynamically according to your concern or viewpoint and
calculate the semantic relations between data like the
word and the document as an amount of the correlation,
which reflect the accumulated knowledge of IR.
Documents of international relations include wideranging information on time and space. Quick survey and
appropriate acquirement of information will be required
more and more as time goes on. We are convinced the
importance and urgency to analyze various ages and
forms of documents and those of various actors---country,
region, organization, government, group and individual---
, and think that our methods will be a help to get useful
information from the past for future needs. As the future
work, we will develop this integrated semantic space for
the mechanism to treat the time-series document data so
as to analyze the changes of actor’s cognition, attitudes
and values.
References
[1] Ole R. Holsti, “Content Analysis,” Gardner Lindzey and
Elliot Aronson eds., The Handbook of Social Psychology,
1968, 596-632.
[2] Holsti and Robert C. North, “Comparative Data from
Content Analysis: Perception of History and Economic
Variables in the 1914 Crisis,” Richard L. Merritt and Stein
Rokkan eds., Comparing Nations: The Use of Quantitative
Data in Cross-National Research, 1966, 169-190.
[3] Robert Axelrod ed., The Structure of Decision : The
Cognitive Maps of Political Elites, Princeton U. P., 1976.
[4] Christer Jonsson ed., Cognitive Dynamics and
International Politics (London : Frances Printer, 1982).
[5] Kitagawa, T. and Kiyoki, Y.:The mathematical model of
meaning and its application to multidatabase systems,
Proceedings of 3rd IEEE International Workshop on
Research Issues on Data Engineering:Interoperability in
Multidatabase Systems, April 1993, 130-135 .
[6] Kiyoki, Y. Kitagawa, T. and Hayama, T.:A metadatabase
system for semantic image search by a mathematical
model of meaning, ACM SIGMOD Record, Vol. 23, No. 4,
1994, 34-41, .
[7] Kiyoki, Y., Kitagawa, T. and Hitomi, Y.:A fundamental
framework for realizing semantic interoperability in a
multidatabase environment, Journal of Integrated
Computer-Aided Engineering, Vol.2, No.1, Jan. 1995, 3-20.
[8] Evans, Graham and Newnham, Jeffrey : Dictionary of
International Relations (Penguin Books, 1998).
[9] Longman Dictionary of Contemporary English (Longman,
1987).
Download