14 pt type size, 10 words max, uppercase, bold, centered

Semantic Space Creation and Associative Search Methods for a Document Database of International Relations * Shiori Sasaki, **Yasushi Kiyoki, ***Taizo Yakushiji * Graduate School of Media and Governance, Keio University ** Faculty of Environmental Information, Keio Univeristy *** Faculty of Law, Keio University *, ** 5322, Endo, Fujisawa, Kanagawa, JAPAN *** 2-15-45 Mita, Minato-ku, Tokyo, 108-8345 Japan E-mail: s-sas@jcom.home.ne.jp, Kiyoki@mdbl.sfc.keio.ac.jp, yakushi@iips.org Abstract In this paper, we present a new creation method of a semantic retrieval space for the field of International Relations (IR). This method creates an integrated metadata space for computing semantic relationships between words in an IR lexicon and in a general dictionary. The created semantic space is applied to the mathematical model of meaning which has already been proposed. This model makes it possible to compute semantic relationships between words dynamically according to a given context. Using the semantic space made by this method, we can search the document which consists of IR terms by using general words, and search for the one which consists of general words by using IR term. Key Words semantic associative search, international relations, document database 1. Introduction A large number of information resources are distributed in the world-wide network environment. Those resources include a lot of documents which relate to international relations and world politics, from official announcements of the governments or the international organizations, policy statements, parliamentary papers, press briefings, activity reports of NGO to announcements in the form of informal talks of politicians. In this environment, one of the most important issues for researchers of international relations or world politics is how to extract appropriate information according to their concerns and viewpoints. However, it is difficult to obtain the documents accompanied by the interpretation of the meaning of the word and data because general search engines including the category retrieval in WWW adopt a simple patternmatching method. And it is also difficult for researchers of international relations to analyze the semantic content in documents multilaterally and dynamically because existing methods in the study field such as the Content Analysis [1][2] and the Cognitive Map [3][4] aim at knowledge discovery mainly for the static character of documents. The Content Analysis and the Cognitive Map were introduced into the study field of international relations and world politics in 60's-70’s, and have been applied to the analyses of the cognition, attitude or images to the another country of the policy makers through the published documents. Especially, newspapers and party organs have been used as object of the analysis to grasp the trend of public opinion and the policy of political parties. And the speeches, statements, exchange documents and letters by the heads of each country have been used for the analyses of their cognition because of the difficulty of getting interview directly with them. The Content Analysis is the method which measures the appearance frequency of a word or an encoded sentence in document groups, and calculates the correlation level of each word, code and document. Sometimes it accompanies the multivariate analysis. On the other hand, the Cognitive Map is a method for analyzing logical routes of the cognition of the policy makers. It considers a logical structure of the document to be a concept-network in the mind of the author or the speaker, shows the causal relations between each concept by “-“, “+”and “0”, and calculates the logical main rout and the highlight concept. However, these methods have weaknesses that the Content Analysis cannot measure the semantic relations between words included in document groups, and the Cognitive Map cannot measure the semantic relations between documents. In the Content Analysis, to treat a large amount of document data in a semiautomatic way, it is often adopted to measure the appearance frequency of words by mechanical pattern-matching. Here the problem is that the contextual meaning of a word in the document is not reflected in the word appearance frequency. For instance, whether the word “engagement" in the document is used to mean the economical "contract" or the military "involvement" does not apparent in the result of measurement. Or, it is difficult to read whether the single word “development" means “economic development" or "development of weapon" from the result. On the other hand, showing the relations between concepts, the Cognitive Map is not suitable for the comparison analysis between a large amount of documents because it is necessary to make a map per document and cannot show the strength of the relation quantitatively. Compared to these methods, the Semantic Associative Search Method based on a mathematical model of meaning which has already proposed in articles [5][6][7] makes it possible to extract and obtain significant information from multiple document data with a machinery. In this method, the acquisition of information is performed by semantic computations so that users can search the document dynamically according to their own contexts and points of view. Therefore we thought it could be possible for researchers to treat and analyze a large amount of documents in a form of original data without encoding by applying the Semantic Associative Search method to analysis of documents of international relations. In this text, to implement the environment which applies the Semantic Associative Search method to the retrieval and the analysis of the document of international relations, we show the creation method of the Semantic Retrieval Space based on the mathematic model of meaning. There are chiefly two features of this method. First, this method realize a semantic space for Semantic Associative Search which can be applied to the document analysis in international relations. Second, this method realizes the mechanism which measures semantic relation between the technical terms and the general words, by integrating the space constructed from source A (lexicon) and the space from source B (dictionary). It can be said that the former has a methodological value in the study of international relations, the latter has a value for database engineering. 2. Outline of the Semantic Associative Search method In this section, we review the outline of the Semantic Associative Search method based on the mathematical model of meaning. This model has been presented in [5][6][7]] in detail. Our creation method described in this paper is realized by this Semantic Associative Search method. features. The m basic data items is given in the form of an m by n matrix M. For given m basic data items, each data item is characterized by n features. By using this matrix M, the orthogonal space is computed as the metadata space MDS. 2.2 Representation of information resources in ndimensional vectors: Each of the information resources is represented in the n-dimensional vector whose elements correspond to n features used in 2.1. These vectors are used as “metadata for information resources”. The information resources become the candidates for the semantic associate search in this model. Furthermore, each of context words, which are used to represent the user's impression and data contents in semantic information retrieval, is also represented in the n-dimensional vector. These vectors are used as “metadata for contexts.” First we construct the correlation matrix with respect to the features. Then we execute the eigenvalue decomposition of the correlation matrix and normalize the eigenvectors. We define the image space MDS as the span of the eigenvectors which correspond to nonzero eigenvalues. We call such eigenvectors semantic elements hereafter. We note that since the correlation matrix is symmetric, the semantic elements form orthonormal bases for MDS. The dimension v of the image space MDS is identical to the rank of the data matrix A. Since MDS is v dimensional Eucledian space, various norms can be defined and a metric is naturally introduced. 2.3 Mapping information resources into the metadata space MDS: Metadata items (data-items for space creation, metadata for information resources and metadata for context words) which are represented in n-dimensional vectors are mapped into the orthogonal metadata space. We consider the set of all the projections from the image spaceMDS: to the invariant subspaces (eigen spaces). We refer to the projection as the semantic projection and the corresponding projected space as the semantic subspace. Since the number of i dimensional invariant subspaces is (v (v-1)…(v – i + 1))/i !, the total number of the semantic projections is 2v. That is, this model can express 2v different phases of the meaning. 2.4 Semantic associative search: 2.1 Creation of metadata space: To provide the function of semantic associative search, basic information on m data items ("data-items for space creation") is given in the form of a matrix. Each data item is provided as fragmentary metadata which is independently represented one another. No relationship between data items is needed to be described. The information of each data item is represented by its When a sequence of context words which determine the user's impression and data contents are given, the mostly related information resource to the given context is extracted from a set of metadata items for information resources in the metadata space. Suppose a sequence sl of l words (context words) which determines the context is given. We construct an operator Sp to determine the semantic projection according to the context. Context words are given as a sequence of several keywords which are defined with ndimensional vectors to specify the query for information retrieval. We call the operator a semantic operator. (a) First we map the l context words in databases to the image space MDS. This mathematically means that we execute the Fourier expansion of the sequence sl in MDS and seek the Fourier coefficients of the words with respect to the semantic elements. This corresponds to seeking the correlation between each context word of sl and each semantic element. (b) Then we sum up the values of the Fourier coefficients for each semantic element. This corresponds to finding the correlation between the sequence sl and each semantic element. Since we have v semantic elements, we can constitute a v dimensional vector. We call the vector normalized in the infinity norm the semantic center of the sequence sl. (c) If the sum obtained in (b) for a semantic element is greater than a given threshold ε, we employ the semantic element to form the projected semantic subspace. We define the semantic projection by the sum of such projections. This operator automatically selects the semantic subspace which is highly correlated with the sequence sl of the l context words which determines the context. This model makes dynamic semantic interpretation possible. We emphasize here that, in our model, the ”meaning” is the selection of the semantic subspace, namely, the selection of the semantic projection and the “interpretation” is the best approximation in the selected subspace. 3. Outline of the creation method of a Semantic Retrieval Space for IR documents This method aims to create a semantic retrieval space which intended for the document data group containing the technical terms of international relations (hereinafter called “IR”). Using the semantic space made by this method, we can search the document which consists of IR terms by using general words as query, and search for the one which consists of general words by using IR technical terms. In addition, this method requires a lexicon which explains the technical terms of this field, a dictionary which explains general words and a document database concerning IR as a retrieval target. In other words, this space creation method can be applied also to other specific fields if only a lexicon of the field and a general dictionary exist. The schema of the space is represented as an ordered set of “basic words” and an ordered set of “feature words” in a form of matrix. The ordered set of basic words is the vertical elements in the matrix, and the ordered set of feature words is the horizontal elements in the matrix. And we define an IR technical term of basic words as “IR basic term(wIR)”, and a general word of basic words as “general basic word(wG).” Also we define a feature word which represents the relation to basic words as "related feature word (fr)", and a feature word which represents the definition of a basic word as "defining feature word (fd)". Related feature words Defining feature words fr-1fr-2　… fr-m fd-1fd-2 … fd-n Technical wIR-1 basic terms wIR-2 Step2: Defining technical terms by general words IR basic matrix … IR-M ｒ IR-M ｄ wIR-k General wG-1 basic wordswG-2 general words matrix … G-M r G-M d reference definition wG-ｌ Step 1: Relating general words to technical terms Figure 1: Structure of the space creation Figure 1 shows the structure of the space creation method. IR-Mr is a metadata matrix which shows the relations between IR terms. For given k IR basic terms (wIR-1, wIR-2, …, wIR-k), each term is characterized by m related feature words of IR field (fr-1, fr-2, …, fr-m). G-Md is a metadata matrix which shows the definition of general basic words. For given l general basic words, each word is characterized by n defining feature words（fd -1, fd -2, … , fd-n）. To create an integrated matrix IR/G-Mrd, part GMr and part IR-Md are added to IR-Mr and G-Md . That is part IR-Md is a partial matrix which shows the relations between IR basic terms and defining feature words. Part G-Mr is a partial matrix which shows the relations between general basic terms and related feature words of IR terms. 3.1 Creation of an IR basic matrix IR-Mr To create matrix IR-Mr, a set of feature words enough to express IR field is needed. With a lexicon of IR, technical terms which appear in the explanation of the every particular item are extracted as the ordered set of related feature words. Next, every particular item is extracted from the lexicon as the ordered set of IR basic terms. Then, each IR basic term is characterized by the related feature words. “1” is set to a related feature word which appears as a positive sense in the explanation, “-1” is set to the related feature word which appears in a negative sense, and “0” is set to a related feature word which does not appear in the explanation. Through this process, the IR basic matrix IR-Mr, which shows the relation between IR basic terms and the related feature words is created. 3.2 Integration of the IR basic matrix and the general words matrix To compound matrix IR-Mr and matrix G-Md, we create partial G-Mr and partial IR-Md. Step 1: Relating the general words to the technical terms For the creation of part G-Mr, l general basic words(wG-1, w G-２, …, w G-l) are characterized by m related feature words of IR field (fr-1, fr-2, …, fr-m). Step2: Defining the technical terms by the general words For the creation of part IR-Md, k IR basic terms (wIR-1, wIR-2, …, wIR-k)are characterized by n defining feature words（fd -1, fd -2, …, fd-n）. Step 3: Adding other words to the vertical elements The words which exist in neither matrix IR-Mr nor matrix G-Md but appear frequently in the document groups are added to vertical elements as basic words, and characterized by the defining feature words and the related feature words of IR. By these processes, the integrated matrix IR/G-Mrd is created. 4. Realization of Semantic Space for International Relations 4.1 Creation of an IR basic matrix As an example of realization of a semantic space by the method shown in 3.1, we referred Dictionary of International Relations [8] (hereinafter called "IR-Dic.") , which is widely used in the study of IR. This lexicon explains 716 technical terms by their definitions, sources, history, and relevance with other terms. Every 716 term of items was extracted as IR basic term (wIR). From the explanatory note of each item, only the related terms were extracted as related feature words (fr). The values were determined by the way shown in 3.1 Through this process, the IR basic matrix was created. For example, for the term "arms control,” the value 1 is set to the related feature words such as "capability", "actor", "crisis management", "deterrence", "disarmament", "Cold War", "superpower", "non-proliferation", "ABC weapons" and "security regime." This IR basic matrix expresses the relevance between the terms in IR-Dic. It became the 712 x 712 matrix consisting of 712 basic words and 712 feature words. Then, the created space based on this matrix consists of 710 dimensional vectors. We call this space IR space. 4.2 Integration of the IR basic matrix and the general words matrix Matrix which is created by using a general dictionary is compounded to the IR basic matrix. We referred Longman Dictionary of Contemporary English [9] (hereinafter called "Longman-Dic."), which explained about 56000 general words by about 2000 basic words. We selected 2115 basic words both as the general basic words (wG) and the defining feature word (fd). Then, the 2115 x 2115 matrix is created, which represent the definitions of general words in Longman-Dic. Step 1: Relating the general words to the technical terms Part G-Mr is created by the process shown in 3.2. Step 1. As an example, the general basic word "arms" is characterized by the IR related feature words such as "arms control", "arms race" and "arms sales". This characterization is checked by the specialist of IR field according to their knowledge. Step 2: Defining the technical terms by the general words Part IR-Md is created by the process shown in 3.2. Step 2. For characterization, we extracted the verb and the noun from the terminological definition of the explanatory note of IR-Dic.. As an example, the basic IR term "arms control" is characterized by the defining feature words such as "arms", "control", "reduce", "remove", "weapon", "threat" and "force." When there was a word which was not in the defining feature words in IR-Dic., we looked up the word in LongmanDic. and extracted the verb and the noun from the explanation. Step3: Adding other basic words to the vertical elements The important words such as "democracy", "economy", and "policy", which exist in neither IR basic terms nor general basic words but frequently appear in documents were added to the vertical elements and characterized by the related feature words and defining feature words. We used LongmanDic. and IR-Dic. for this process. As a result, the new integrated matrix which has about 2000+712 basic words in the vertical elements and 2861 feature words in the horizontal elements was created. The created space based on the matrix consists of 2846 dimensional vectors. We call this space the integrated space. 5. Experiment To verify the feasibility and effectiveness of the integrated space, we performed several experiments. Experiment 1: Comparison of correlation between each word in the IR space and the integrated space Experiment 2: Application experiment of document retrieval in the integrated space 5.1 Experiment 1 5.1.1 Evaluation method Experiment1-1: We selected an IR technical term “arms control" as a keyword of query, and retrieved correlated words in the IR space created by the process of 4.1, and in the integrated space created by the process of 4.2. Then, we compared the top 30 retrieval results in both spaces. The result is shown in Table 1. Experiment1-2: First, we selected 15 IR technical terms such as “weapons of mass destruction”, “economic liberalism”, “non-tariff barriers” as keywords from each Table 1: Comparison of the retrieval result in the IR space and the integrated space IR space rank Integrated space retrieved w ord correlation 1 NPT 2 arm s control 3 nuclear proliferation 4 inspection 5 non-proliferation 6 proliferation 7 nuclear w eapons 8 preventive w ar 9 second strike 10 non offensive defence 11 M A D 12 deterrence 13 C TB T 14 C uban m issile crisis 15 accidentalw ar 16 lim ited nuclear w ar 17 parity 18 force 19 realism 20 verification ～～ 29 tacticalnuclear w eapons 30 chem icaland biologicalw ar 0.422383 0.386465 0.367211 0.354512 0.345976 0.344098 0.32252 0.315053 0.30641 0.305555 0.30432 0.303463 0.303237 0.29737 0.290768 0.288547 0.287821 0.283384 0.279878 0.277056 ～ 0.261701 0.260292 retrieved w ord w eapon NPT nuclear w eapons nuclear proliferation IN F treaty non-proliferation tacticalnuclear w eapons S T A R T II C TB T C uban m issile crisis proliferation STA R T I S A LT w eapons of m ass destruction cruise m issile second strike m assive retaliation arm s control m issile deterrence ～ flexible response horizontalproliferation correlation 0.597043 0.411523 0.401119 0.370773 0.35418 0.346241 0.332596 0.331209 0.321286 0.312302 0.308877 0.305525 0.297272 0.294429 0.294311 0.285025 0.27672 0.268601 0.268427 0.266044 ～ 0.242601 0.23927 issue area of IR, security, political economy, international organization, human rights, global environmental problems, theoretical perspective and concept. Second, for these keywords, we retrieved the words in the IR space similarly in Experiment 1-1, and fixed the top 10 words as correct answers. Then, we measured the ratio of the correct answers ranked in the top 10 and top 20 in the integrated spaces. The result is shown in Figure 2. Correct rate in the top 10 Correct rate in the top 20 120% 100% 80% 60% 40% 20% w ea po ns o f m ar a m ec ss d s co on es nt o t ro no mic ruct l n- li io ta be n rif ra f b l is ar m ri e rs hu m W an TO ita ri a EU n i et nte U ec hni rven N ol c c ti og le on y an ec si op ng ol iti cs ne IN or GO ea li po sm la r re i t y gi m pa e rit y 0% Figure 2: The rate of the retrieval correctness in the integrated space Experiment1-3: We selected the general words "trade", "environment" and "human" as keywords of queries and checked each result of the top 10 in the integrated space. The result is shown in Table 2. Table 2: The retrieval result in the integrated space keyword: trade rank retrieved word 1 GATT 2 free trade area 3 protectionism 4 tariff 5 Tokyo round 6 trade 7 free trade 8 common market 9 quota 10 Kennedy round correlation 0.46484 0.458932 0.447085 0.43101 0.407401 0.389198 0.380862 0.379662 0.363634 0.359655 environment retrieved word correlation organization 0.447723 environment 0.445309 pollution 0.403463 green movements 0.347933 ecology/ecopolitics 0.308913 INGO 0.30539 north/south 0.302781 globalization 0.255842 NATO 0.244127 Earth 0.239894 human retrieved word correlation human 0.501031 ethnic cleansing 0.4802 genocide 0.430277 Genocide Convention 0.415171 immigration 0.367203 international law 0.35866 nationalism 0.352227 nation 0.329126 ethnic nationalism 0.328147 balkanization 0.325926 5.1.2 Experimental results Experiment1-1: The experimental results show that the integrated space realizes high quality retrieval for IR terms. For query “arms control,” the words such as "weapon", "INF treaty", "START II", "START I", "weapons of mass destruction", and "cruise missile" are included in the top 30 in the integrated space, which are closer to “arms control” in semantic relation but not included in top 30 in the IR space. At the same time, the words such as "tactical nuclear weapons" and "CTBT" , which are closer to “arms control” in semantics, are selected in the high ranking in the integrated space. For example, the IR basic term "INF treaty" is characterized only by the related feature words such as "nuclear weapons”, “arms control”, “START I”, “START II”, “nineteen-eighty-nine”, “Warsaw Pact”, “bloc”, “constructive engagement”, “perception”, “Gorbachev doctrine” and “Cold War” in the IR basic matrix, but also characterized by the defining feature words like "arms”, “weapon”, “missile”, “treaty”, “agreement”, “remove” and zero" which are the common feature of “arms control” in the integrated matrix. That is why “INF treaty” was reasonably ranked higher in the result of the integrated space than in the IR space. Experiment1-2: For more than a half of queries, the rate of the correct answers included in the top 10 retrieval results in the integrated space was 70%. Furthermore, for all queries, the correct answers were included in the top 20 at the high rate of 80% to 100%. As an example, for the query ”economic liberalism", “protectionism”, “GATT”, “free trade”, “common market”, “economic liberalism”, “tariff”, “free trade area”, “Tokyo Round”, “WTO” and “quota" ranked as the top 10 in the IR space whereas in the integrated space, "Kennedy Round" and "non-tariff barrier" ranked instead of "common market" and "WTO." This means that the integrated space keeps the retrieval quality without breaking the fundamental structure of the IR space. Experiment1-3: In the integrated space, for all the queries that use the general words as keywords, the IR technical term related to them ranked in the top 10 in the integrated space. This means that it is possible to retrieve a document which consists of IR technical term by using general words as keywords. As an example, for the query of the general word "environment", the IR term such as "pollution", "green movements", "ecology", "INGO" and "globalization" ranked in the top 10 in the integrated space. This is because these IR terms are characterized by the defining feature word such as "air”, “water”, “green”, “people”, “plant” and “Earth" which are common to the feature of “environment” in the integrated matrix. Moreover, for each query, the general words reflecting the common knowledge of IR field has selected in highranking. This shows that the characterization of the general words by technical terms is done appropriately. 5.1.3 Analysis From these result, it was verified that the characterization including both defining and relating is made more appropriately in the integrated space than in the IR space. It was also verified that not only the IR terms but also the general words which reflect the knowledge of IR field could be retrieved by both general words and IR technical terms. keyword set of both IR terms and general words (upperright cell: III and middle-right cell: VI). And the case in which the document only with IR-term-metadata were retrieved by the keyword of general word (upper-middle: II) and the case in which the document only with generalword-metadata were retrieved by the keyword of IR word (middle-left: IV), the relevance ratio was not so high but at least 5 documents ranked in the top 10. 5.2 Experiment 2 It experiment 2, we retrieve documents in the integrated space created through the process of 4.2. 5.2.1 Evaluation method We collected 40 documents concerning IR from WWW as retrieval candidate and prepared three kinds of metadata set for documents; 1) only IR technical term, 2) only general terms, 3) both IR terms and general terms. As an example, metadata sets of pattern 3) are shown in Figure 3. We also classified type of keyword for queries into three kinds; 1) only by IR term, 2) only by general term, 3) by both IR term and general term. Moreover, three kinds of reference words were similarly prepared about the reference word. 3x3 kinds of combination for the experiment is shown in Table. 3. ID doc1 doc2 ～ doc15 doc16 ～ doc39 doc40 matadata of document trade system, free trade, tariff, regime, import, product, trade1, standard1, success… aid, north-south, LDCs, forth-world, poor, people1, aids, die1… ～ human rights, war, intervention, communal conflict, prisoner, race, soldier, rights, attack ... epistemic communities, futurology, ...global governance, future, government, theory, idea… ～ water, natural resources, world-politics, environment , stop, dirty, air, water, protect, earth… armistice, war, conflict, demilitalization, intelligenge, stop, attack, fight, information … Figure 3: Examples of metadata Table 3: Combination of metadata and keyword for queries Keywords for query IR terms Metadata of document General words IR terms + general words General words Ⅰ Ⅳ Ⅱ Ⅴ Ⅲ Ⅵ IR terms + general words Ⅶ Ⅷ Ⅸ IR terms Next, we selected the IR term “conflict”, “crisis” and the general word “attack”, “crash” as keywords for queries and fixed eight documents about the security or conflict as correct answers in advance. Then, we put them ID as doc5, doc10, doc15, doc20, doc25, doc30, doc35 and doc40. The retrieval result according to nine kinds of combination is shown in Table 4. 5.2.2 Experimental results The case in which the document with metadata of both IR terms and general words were retrieved by the keywords of both IR term and general word shows the best retrieval quality (lower-right cell: IX). Even if the document were given only IR-terms-metadata or the document were given only general-word-metadata, they were marked relatively high relevance ratio by the metadata: IR terms metadata: General words metadata: IR terms + General words keyword: keyword: keyword: IR terms ID correlation doc40 0.502226 doc5 0.492371 doc15 0.431048 doc20 0.337915 doc25 0.268766 doc35 0.246094 doc11 0.241361 doc2 0.233853 doc30 0.231785 doc33 0.220733 General words ID correlation doc5 0.501295 doc40 0.392284 doc15 0.355318 doc20 0.284476 doc37 0.202732 doc11 0.201073 doc9 0.199328 doc17 0.199207 doc23 0.195791 doc25 0.193684 IR terms + General words ID correlation doc5 0.544352 doc40 0.535234 doc15 0.45911 doc20 0.343914 doc25 0.256491 doc11 0.242029 doc35 0.236115 doc2 0.235324 doc30 0.233203 doc33 0.199303 1 2 3 4 5 6 7 8 9 10 doc40 doc30 doc10 doc20 doc24 doc35 doc5 doc26 doc32 doc11 0.510731 0.424205 0.259057 0.24301 0.222061 0.198955 0.188389 0.181802 0.181352 0.179863 doc10 doc20 doc40 doc30 doc14 doc5 doc8 doc25 doc23 doc15 0.512244 0.446176 0.433106 0.364944 0.284089 0.277443 0.26781 0.264337 0.249578 0.24773 doc40 doc30 doc10 doc20 doc24 doc35 doc5 doc25 doc26 doc6 0.552874 0.45841 0.32358 0.295688 0.218491 0.204756 0.199559 0.178692 0.169355 0.16754 1 2 3 4 5 6 7 8 9 10 doc5 doc40 doc15 doc30 doc20 doc25 doc35 doc33 doc38 doc10 0.472904 0.468055 0.390095 0.342379 0.311306 0.264962 0.255268 0.222828 0.219296 0.21916 doc5 doc20 doc40 doc15 doc10 doc30 doc14 doc17 doc23 doc8 0.493925 0.394756 0.378979 0.340018 0.322031 0.299353 0.271628 0.254037 0.245184 0.236924 doc5 doc40 doc15 doc30 doc20 doc25 doc35 doc10 doc2 doc24 0.52478 0.498384 0.416298 0.3652 0.341837 0.258615 0.250516 0.23775 0.209467 0.207355 rank 1 2 3 4 5 6 7 8 9 10 5.2.3 Analysis From these results, it is verified that the document which consists of IR terms can be retrieved by using general words, and the one which consists of general words can be searched by using IR term in the integrated space. 6. Conclusion In this paper, we showed the creation method of semantic retrieval space concerning the documents of international relations. By this method, a new integrated space can be created from the matrix constructed from a general dictionary and the matrix constructed from a lexicon of specific study field. Applied to the semantic retrieval search, this creation method of space was verified its feasibility and accuracy. Using this semantic retrieval space, you can retrieve and analyze documents dynamically according to your concern or viewpoint and calculate the semantic relations between data like the word and the document as an amount of the correlation, which reflect the accumulated knowledge of IR. Documents of international relations include wideranging information on time and space. Quick survey and appropriate acquirement of information will be required more and more as time goes on. We are convinced the importance and urgency to analyze various ages and forms of documents and those of various actors---country, region, organization, government, group and individual--- , and think that our methods will be a help to get useful information from the past for future needs. As the future work, we will develop this integrated semantic space for the mechanism to treat the time-series document data so as to analyze the changes of actor’s cognition, attitudes and values. References [1] Ole R. Holsti, “Content Analysis,” Gardner Lindzey and Elliot Aronson eds., The Handbook of Social Psychology, 1968, 596-632. [2] Holsti and Robert C. North, “Comparative Data from Content Analysis: Perception of History and Economic Variables in the 1914 Crisis,” Richard L. Merritt and Stein Rokkan eds., Comparing Nations: The Use of Quantitative Data in Cross-National Research, 1966, 169-190. [3] Robert Axelrod ed., The Structure of Decision : The Cognitive Maps of Political Elites, Princeton U. P., 1976. [4] Christer Jonsson ed., Cognitive Dynamics and International Politics (London : Frances Printer, 1982). [5] Kitagawa, T. and Kiyoki, Y.：The mathematical model of meaning and its application to multidatabase systems, Proceedings of 3rd IEEE International Workshop on Research Issues on Data Engineering：Interoperability in Multidatabase Systems, April 1993, 130-135 . [6] Kiyoki, Y. Kitagawa, T. and Hayama, T.：A metadatabase system for semantic image search by a mathematical model of meaning, ACM SIGMOD Record, Vol. 23, No. 4, 1994, 34-41, . [7] Kiyoki, Y., Kitagawa, T. and Hitomi, Y.：A fundamental framework for realizing semantic interoperability in a multidatabase environment, Journal of Integrated Computer-Aided Engineering, Vol.2, No.1, Jan. 1995, 3-20. [8] Evans, Graham and Newnham, Jeffrey : Dictionary of International Relations (Penguin Books, 1998). [9] Longman Dictionary of Contemporary English (Longman, 1987).

14 pt type size, 10 words max, uppercase, bold, centered

Related documents

Products

Support

14 pt type size, 10 words max, uppercase, bold, centered

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib