..-^ 4 Dewfv HD28 M4 1 . ')w. i'^^-&7 ALFRED P. WORKING PAPER SLOAN SCHOOL OF MANAGEMENT INTEGRATING DISPARATE DATABASES FOR COMPOS ite-ansv:frs Stuart E. Madnick Y. September 1987 Richard Wang #WP 1936-87 MASSACHUSETTS INSTITUTE OF TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE, MASSACHUSETTS 02139 INTEGRATING DISPARATE DATABASES FOR COMPOSITE-ANSWERS Stuart E. Madnick Y. Seotember 1987 Richard Wang #WP 1936-87 ' "IT. LIBRARIES m Proceed i ngs or the Hawaii International Conrerence To appear on System Sciences January 1988. , Integrating Disparate Databases For Composite- Answers Stuart E. Madnick Y. Richard Wang* E53-320, Sloan School of Management MIT, Cambridge. ^LA. 02139 (617)253-6671 * On leave from the MIS Department, University of Arizona, Tucson, ABSTRACT Many databases to to important information systems require multiple independent work together within and'or increase productivity. Systems (CIS). A College of BPA, AZ 85721 We across organizational boundaries in order refer to this type of systems as key area of research in CIS is Composite Information logical connectiuity which deals with the process of accessing disparate databases in concert for composite-answers. major problems that need to contradiction, inconsistency, be overcome to The attain logical connectivity include and ambiguity. This paper presents an approach to resolve these problems through enhancing the semantic power of the database integration. This approach exploits concepts drawn from frame-based knowledge representation and rule-based inferencing. An object-oriented prototype is also presented to illustrate the process involved in formulating composite-answers where different levels of abstraction are required. KEY WORDS AND PHRASES : Distributed Database Systems, Object-Oriented Programming, Organizational Information Systems, Strategic Computing. ACKNOWLEDGEMENTS Work reported herein has been supported in part by the Department of Transportation's Transportation System Center, the U.S. Air Force, the Center for Citibank. Management of Information at the University of Arizona, and 1. Introduction Significant advances in the price, speed-performance, capacity, and capabilities of new database technology have created a wide range of opportunities for commercial applications. These opportunities are widely recognized now as strategically important to corporations. It is also increasingly evident that the identification of strategic applications alone does not result in success for an A organization. careful and delicate interplay between choice of strategic applications, appropriate technology, attain success, as depicted in Figure and organizational responses must be made 1 [2]. An effective information system is to one that successfully align the problems and opportunities across these three domains. Strategic Applications Organization Figure 1 A Technology Strategic Applications. Technology, and OrganizationalResearch Initiative (SATOKI) One important category linkage (e.g., integration tying into supplier and, or buyer systems) and. or intra-corporate (e.g., tying together disparate functional areas within a firm). This category of systems hereinafter. of strategic applications involve inter-corporate is referred to as Composite Information Sv.s/em.s (CIS) [2, 6, 14] This paper addresses issues involved in the evolution of separate systems more fully integrated CIS in order to to a gain strategic advantage. Benefits of CIS and recent research activities are presented in the remainder of this section. Section 2 provides a case study of multiple tour-guide databases to illustrate the strategy for semantic integration of disparate database systems. Section 3 presents a solution that combines Data Base Management System (DBMS) and Expert Systems (ALES) techniques Artificial Intelligence / to facilitate logical connectivity. In particular, concepts drawn from frame-based knowledge representation and ruie-based inferencing are exploited. Furthermore, an object-oriented prototype system was developed to test this approach. Finally, concluding remarks are made in section 4. Benefits of CIS A key idea of CIS is the integration of disparate databases for composite answers. Without integration, prone to it is difficult, obtain key information which expensive, time-consuming, and error- may be distributed in databases located in difTerent divisions of different organizations. Consider the following case study of a major international bank Figure 2. as shown in Currently, three separate database systems are being used for loan management, cash management, and letters-of-credit processing. requests that $50,000 be transferred to another account. in the [2], If that client's cash balances funds transfer system can not cover that transaction, though that client may have a 520,000,000 Suppose a client it will be rejected - even active letter-of-credit! This rejection, besides being annoying to the client, will require significant effort to correct by manually drawing on the letter-of-credit to cover the funds transfer. If the connect the three separate database systems together concert, so that funds can be automatioallv to drawn on the bank can access information in letter-of-credit, then ® / £ESSX \ LOAN CASH MANAGEMENT MANAGEMENT SYSTEM SYSTEM I X DATABASE DATABASE DATABASE 9 3 1 Figure 2 An Electronic LETTER OF CREDIT SYSTEM Banking Svstem Without Integration product-differentiation will be achieved via the enhanced quality of service, reprocessing costs can be reduced since special and manual intervention can be avoided. Recent Re s earch Activities Researchers in the DBMS and ALES DBMS AI'ES community has begun to have been striving to enhance their The database community seeks additional systems in a converging direction. semantic capability in their fields to "understand" more of the real world, while the address the increasing This converging trend has prompted researchers to demand for database access. address the connectivity problem from different perspectives. Furthermore, twelve methodologies for database schema integration have been analyzed and compared by Batini, Lenzerini. and Navathe We summarize some recent research The open-systems group It is [5] activities that benefited our [1]. work below. deals with highly parallel, distributed, open systems. based on the belief that future computer applications will involve the interaction of subsystems that have been independently developed at disparate geographical locations. Message Passing Semantics developed The is a methodology being to tackle connectivity. MULTIBASE research project [3, 15] attempts to provide a through a single query language and database schema to uniform interface data in pre-existing, heterogeneous, distributed databases. The Functional Data Model and the data language DAPLEX are used as the execute queries that may common data model and language to efficiently require data from different databases with different schemata, data models, and query languages. The federated architecture research [4, 9] aims at uniting a collection of independent database systems into a loosely-coupled federation. The export-schema specifies the information that a component will share with other components, while the import-schema specifies the non-loca! information that a component wishes to manipulate. Each of these research efforts has addressed certain aspects of connectivity. In comparison, the research goals of the CIS research project [2, 6, 14] is twofold: (a) to develop a methodology for integrating disparate database systems with a particular focus on logical connectivity; and (b) to demonstrate the feasibility of a system capable of attaining logical connectivity in a real environment with satisfactory performance. 2 Developing CIS Recent advances in DBMS and AlES have provided a new perspective design of CIS. At the external level, query languages such as Forms (QBF) in INGRES have been employed Although these interfaces are powerful, the user to statement-of-objective which (IFEP). is A CIS and Query-By- needs is to to follow certain formats. formulate composite answers even more powerful if the user can should allow the user to query by a in order to achieve the user's objective, it wil be simply "query by objective (QBO)." the provide user-friendly interfaces. still Since the purpose of accessing multiple databases QUEL to processed by an Intelligent Front-End Processor IFEP eventually produces a composite-query executable by the underlying system. At the conceptual level, a composite-query will be processed may required information which to produce the be distributed in disparate databases located in different divisions of different organizations. Integrating these disparate databases requires knowing where all the data are stored, facilitating all the necessary interfaces, constructing queries to retrieve the data, accumulating local results, resolving local differences, and integrating the results into composite-answers. MULTIBASE [15], for Database Interfaces (LDI) for database integration to example, used a Global Data Manager (GDM) and Local provide a uniform interface. have been proposed categorized into two contexts": (a) in the past decade. They are view integration which produces a global conceptual description of a proposed database; and il) Many other methodologies (b) database integration which .Mulli-database interoperability has al^o been proposea Ja an alternative approach. It assumes may access basicall> have no silolial -chema The system should provide that the databases the user the user with functions for manipulating data that mutually nonintcgrated I3i. may be in vi-ibly distinct :^chemata and may be produces the global schema of a collection of databases. This global schema integrated view of all databases in a distributed database environment To develop is an [1]. strategic applications of databases quickly in order tangible benefits before a full-scale heterogeneous database integration to is obtain launched, the concept of virtual-drivers' -^ can be applied, as delineated below. Note that the virtual-driver concept component is is equally applicable in an environment where each a distributed database management system such as R* or INGRES". Virtual-Drivers In the virtual-driver concept [14], users can their traditional functions, as shown still in Figure 3. use real terminals to perform Meanwhile, virtual-drivers are created which are indistinguishable to the system from users of real terminals. A user interested in obtaining composite-answers from multiple databases invokes the CIS-Executive which mediates the virtual-drivers. The CIS-Executive invokes each system (via among its virtual-driver) to obtain the necessary information. Incompatibilities the database systems are resolved by the CIS-Executive which then presents a composite-answer to the user. when using the levels of connectivity need to be considered virtual-driver approach, as discussed below. Physical connectivity refers disparate databases. Although connectivity Two (e.g., to the process of many R&D actual communication among issues need to be addressed in physical bandwidths, security, availability, and reliability) in order to insure an effective CIS. we assume that communication solutions are available, and focus on Lomcal connectivity which is even more challenging. Logical connectivity imporlanl not. to confuse the concept of virtual-driver with \ irtual-lerminal protocols The virtual-terminal protocoU ha\e iiecn iiucnieci to try to hide teimmal idiosyncracie> from application prok^rams throuyih mappiiiii of i-cal terminal- onto a hypothetical network virtual terniinaL In contrast to the narrow mappin:^. the \ irtual-ciriver concept aims at accessing separate databases in concert to tormulate composite answers (2) It is PROCESSING PROCESSING PROCESSING COMPONENT COMPONENT COMPONENT J K DATABASE DATABASE DATABASE 1 J K I fC Figure 3 Integrating Disparate Databases refers to the process of accessing disparate databases answers. The major problems that need to in concert for composite- be overcome to attain logical connectivity include syntax and semantic contradiction, inconsistency, and ambiguity due to different assumptions made in disparate databases. instead of logical connectivity hereinafter. A For brevity, connectivity travel-guides case is used is used to illustrate connectivity strategy. Connectivity Strategy Travel-guides are easy to understand, abundant in data-semantics, and representative of the situation involved in CIS. Litwin and Abdellatif reported a prototype multi-database system using tour-guides for Paris as an example conventional DDL/DML is extended in [8]. The their prototype to incorporate the schema complexities introduced by the explicit recognition that a global available. Similar to MULTIBASE, Abdellatifs prototype via DBA or application DDL/DML. In other words, connectivity programmers; therefore, opaque rule-based inferencing as a strategy is that questions such as of the end-user, making Other advantages to in is Litwin and encoded by the the end-user. In contrast, for An advantage "why and how" can be answered upon of this the request way. to illustrate Massachusetts. 1987 (abbr. connectivity strategy. AAA hereinafter), the They FODOR's England. 1987 (abbr. FODOR), and The Spirit of Massachusetts. 1987 (abbr. MASS). One might envisage that AAA implemented is INGRES* where in implemented organizations. in distributed Oar focus ORACLE, and MASS in R* by competing on the semantics of each database, not the is each FODOR state has a local physical database transparent across the states. Similarly, is we the connectivity-process m.ore transparent to the end-user. will be delineated along the AAA Tour-book New to attain connectivity. Three tour-guides are presented below are attempted is not drawn from frame-based knowledge representation scheme and exploit concepts approach connectivity is DBMS used to implement the database. Interacting with a workstation which uses an CIS-Executive to drive these distributed databases, a travel agent may seek to meet established by a client in her office. Suppose that the clieni is to maximize the quality the composite-query produced by facilities of see of Logan Airport Hilton how we may formulate Logan Airport Hilton schemata shown in IFEP in is to cost. Boston" o statement-of-objective Suppose also that part of obtain a composite-answer of the Boston from the tour-guide databases. a composite-answer for the query in Figure minimum of facility with tement-of-objective th "what are the Let us facilities from the tour-guide databases with database 4. 8 AAA AAA-Iafo: .•V.^ Relations (Name*. Address, Rate-Code. Lodg'.ng-Type. Claisification, Phoned, Other) (Address*, Direction) A- Direction: (Name*, Facility*) (Name*, Credit-Card*) (Name*. Season*, IPL, IPH. 2P1BL. 2P1BH. 2P2BL. 2P2BH, XP, F-code) AAA-Facility: A-AA-Credit: AAA-Rate; FODOR Relations (ID#*, Name. Address, Comment, Location, Package, Category) (ID**, Phone^i') FODOR-Info: FODOR-Phone: FODOR-Facility: FODOR-Service: (ID***, Facility') MASS-Info: (Name*. Address, Facility-Tytpe, Rating, #-of-Rooms, Other) (Name*, Phone**) (Name*, (Name*. Amenity-code*) (Name*, Package-Name*, Package-Descript) (ID**. Service*) MASS MASS-Phone; M.-\SS-Package: Relational Schemata for Figure 4 In ortier to retrieve the data and data dictionaries need MASS to (i.e., "What are the facilities of AAA, FODOR. and MASS. As be accumulated, as shown necessary that amenity in to realize in addition, the amenity-code in construct the Table automatic on most current COLUMNS Logan Airport Hilton in DBMS, 2. and MASS. facilities as the It is to decompose in Boston?") into the a result, data of Logan Airport Hilton can The reader may have recognized that has to Table be converted 2. it is (e.g., 6 means pool) in order Although such mappings are not the^e transformations are fairly straightforward as Pursuing the case further, we observe that AAA data dictionary of SELECT. JOIN, and PROJECT on and are performed by experimental systems such than in the schemata MASS equates to /ac:7ify in FODOR and AAA. In MASS MASS-column of the facilities, the The CIS-Executive can be designed i. subqueries which perform operations such as relations in AAA. FODOR. and MASS and the data-formats be accessed. The are exemplified in Table the query Relations CO MASS-CC: MASS-Amenity to si-of-L'nits. possible that MULTIBASE. FODOR FODOR reports less information simply does not emphasize the other two guides, but specializes in other aspects such as decor. TNAME AAA AAA meanings. Example sources of difference include guides has very different whether • tax, breakfast, service charge, and granuities are included or not. Inconsistency. The name, address, and phone number are reported as follows: AAA: Logan Airport Hilton; Lo^an International Airport. East Boston, 02123 (617) 569-9300 FODOR: MASS: Hilton Inn at Logan. Logan The Logan Airport Hilton. Airport, 569-9300 Int'l Logan International Airport, Boston, 02123 (617)569-9300 The twelve methodologies reported by Batini et al. dealt with [1] also some subset of these issues. This tour-guides case presented above provides a cohesive setting to discuss these issues and the corresponding connectivity strategy, as elaborated below. To resolve these issues, it is necessary to map synonyms and convert different data formats. This can be accomplished, in part, through data dictionaries. View definition and integration techniques can be applied comprehensive view from the may provide a more partial, incomplete information in the local databases. Through comparison, conforming, and integration conceptual schema to of the schemata, a global be proposed and tested against the following criteria: (a) completeness and correctness; This process requires (b) (a) minimality; and (c) extensive manual understandability effort, and (b) [1]. the resulting integrated schema does not contain explicit knowledge of the assumptions made. To resolve these two problems, the data. it is With these concepts in necessary to understand the concepts that underlie mind, reasonable connections based on the content of the database(s). We refer to this type of reconciliation. Connectivity strategy can be employed may problem as semantic to reconciliation problems in order to formulate composite-answers. 13 be established solve semantic- Two parts of this strategy are illustrated below: identifier; (A) and (b) identifying the making a judgment call based on Identifying the A (a) same object without unique global key identifier may same object without a global credibility. a global identifier when multiple databases are not exist involved. For example, there does not exist a primary key identifier which is defined consistently and completely across the tour-guide databases. In order for the subqueries (decomposed by the CIS-Executive discussed earlier) necessary database operations, a unique ID for "Logan Airport Hilton" it in perform the to is needed, be an ID-number, a name, an address, or a phone-number. Note that the name used FODOR is "Hilton Inn at Logan." identifier to retrieve facility from the A moment numbers and a of thought hotel. Let us name retrieved from AAA reference, the "Logan Airport Hilton" is used as the FODOR database, nothing will be returned. would lead one assume that does not share the same phone Hilton" as the If to make a hotel the connection between phone may have more than number with another for the hotel, the Using "Logan Airport phone number (617) 569-9300 could be given the relations in Figure phone numbers 569-9300 hotel. one phone, but for 4. Using (617) 569-9300 as a FODOR. and (617) 569-9300 for MASS could also be retrieved. The remaining question FODOR. From FODOR, is whether "617" is also the area code for 569-9300 in the hotel which has the phone number 569-9300 is classified as located in the Boston area, which has an area-code 617. Consequently, the three hotels have exactly the same phone number, i.e., (617) 569-9300. Since a hotel does not share the same phone with another by assumption, we conclude that same hotel. As a result, facilitv data can be retrieved from the databases. 14 it is the Thus, to identify the same global identifier requires us to object in disparate databases without make an explicit extensive use of inference with knowledge in the application domain. (B) Making a judgment call based on credibility Contradiction, granularity, inconsistency, and ambiguity are unavoiaable when integrating disparate How information disagree? perspectives? An databases. What should we when do make our system can we different sources of sensitive to different informed approach based on the credibility of the can help the delivery process. The fact that specializes in decor and and location, MASS example o^ credibility which can be used to AAA databases FODOR specializes in room-rate, specializes in make local handicapped service a judgment call when an is needed. We give another concrete example below. As mentioned without cable but A closer AAA indicates that for color TV. only indicates if AAA AAA color TV is more specific. It has three categories for CATV for cable TV, and C/CATV for color cable TV. Whereas cable TV is available. Therefore, more detailed; therefore, more likely In addition, Logan Airport Hilton has MASS reports that cable TV is available, contradicting each other. examination reveals that TV: C/T\' MASS earlier, offers fee for to it is fair to be more credible in reporting say that AAA is TV information. movies (assuming that the CIS-Executive is able to interpret the data accumulated in Table 3). If the CIS-Executive also "knows" that cable TV means fee, cable movies such as then it HBO." That is. it m.ovies from special stations such as has color for TV without cable, but offers fee for in HBO. composite-answer is Logan Airport Hilton. The reader may wish to Based on the analysis presented formulated below for the MASS means "fee would conclude that "Cable TV" facilities of in propose variations of the composite-answer. 15 this section, a TV without cable, but with fee for movies, e.g. HBO. (3) air phone in room. i.5i pool. (6) airport transportation available. <T) restaurant. iS) non-smokers' room, and (9) pets allowed. In addition, the following facilities have been reported suites, smoke detectors, entertainments, dancing weekends, bath or shower, attractive furnishing, cocktail bar lounge, near public transportation, and handicapped accessible." "(1) free parking; (2l color conditioning, (4) The tour-guide case has shed additional connectivity. light on the challenge involved in We now turn our attention to connectivity facilitatation. 3 Facilitating Connectivity In speculating the Society of we are "All natural languages, cases .. One possibility is told, Mind (SOM), Minsky states that [11] have much the same kinds of nouns, verbs, adjectives, and that detailed syntactic restrictions are genetically encoded, directly, into our brains ... It would seem inevitable that some early high-level representations, developing in the first year or so, would be concerned with ... the 'things' that natural languages later represent as nouns. Here we indeed suspect generic prestructuring, within sensory systems designed to partition and aggregate their inputs into data for representing 'things." Further, we would need systems with elementary operations for constructing, and comparing descriptions." It is, of course, not our intent here to debate the speculation SOM. The important point is and evidences of that each local database can be perceived as genetically prestructured with certain syntactic restrictions, and designed to partition and aggregate data for representing "things." Further, we would need systems with elementary operations each local database is for constructing, and comparing descriptions. In this way, regarded as a "simple mind" with the "deep structures" [11] along with some means for manipulating them. The "simpTe mind" here, of course, refers to the "prestructured" local schema, and local DML is used to manipulate the "simple mind." If we accept be applied this analogy, then the transient case-shift i3) new to [11] can also where the most highly developed case has the best-developed description structure. Furthermore, Pappert's Principle minds" mechanism ^' can be applied on top of these "simple- accumulate new knowledge and organize the e.xisting knowledge into States that some of the most crucial step> in mental growth are based not simply on acquiring skills, but on acquiring new administrative ways 16 to use what one already knows [12|. different layers, with the top layer being an "intelligent global (a) schema" capable of activating the lower-level agents to retrieve the needed information from the local databases, and case-shift to the upper layers for the semantic conflicts cases; (b) reconciliating Principle of Noncompromise'"*'; and and information (c) among most highly developed different agents through the transforming different sources of knowledge into composite-answers. In light of this discussion, it is natural augmented with rule-based inference to apply frame-based representation [10] capabilities to facilitate connectivity. Object-Oriented Approach It is important to facilitate connectivity. hotel has observe that certain basic principles can be applied to For example, although tour-guides are idiocyncratic, each some rooms, and a room has some kind of facilities and services. As such, it can be represented as a composite-object with properties specialized in different granularities. These basic principles can serve as a reference of disparate databases, guiding the reconciliation. The object. object-oriented approach enables us to encapsulate each database as an Messages can be sent among objects to obtain the information requested by a composite-query. In addition, object inheritance networks can be constructed declare properties shared by different classes of objects. methods to be triggered (e.g., interpretation of a hotel rating rules makes it convenient for send a message to when information on Active-values permit another object rating to is to obtain the requested). Heuristic describing flexible responses to a wide range of events, including situation where a unique global key identifier does not exist across Stales that the longer an internal conil'd pci-i:-'.- anv iiu Ar[ agent's subordinates, the weaker becomes that agent's status among its own ci>n-.pft;iori It' such interna! problems aren't settled soon, other agents will take control and the agents tbrmerly iiuolved will be "dismissed" [12|. '4) . databases (e.g., tour-guide databases use different key identifiers for hotels). In this way, knowledge of object attributes, inheritance properties, and heuristic rules can be accumulated in the CIS-Executive There are many to reconcile semantic differences. LOOPS, object-oriented languages, such as that offer excellent features such as composite-objects and perspectives which are very useful in we wish facilitating connectivity [16]. Since to experiment with various novel concepts involving connectivity to multiple databases, we implemented a specialized frame-based knowledge representation scheme and rule-based inference capability using COMMON-LISP, as discussed below. Implementation Strategy An Abstract Data Base CIS-Executive to integrate disparate databases for composite-answers. higher-level conceptual actual to DBMSs Management System (ADBMS) was implemented DBMS (e.g., DBMS It a sends queries (via messages) AAA. FODOR, and MASS) dictionaries, retrieve information, convert formats, Adding anew is which conceals the implementation details of the from other objects in the community. multiple databases ADBMS as a will not result in any change to access the local data and integrate different views. to the existing applications. Also implemented was a set of commands which provide the basic features of an object-oriented language with extensions to simplify constraint representation. Mechanisms are provided building, relating, and showing objects. database objects, and the actual for interfaces and knowledge with databases as well as The functional relationship among ADBMS, DBMS is illustrated in Figure 5. The reader is referred to Levine [7] for a detailed description of the underlying Knowledge-Object REpresentation Language (KOREL). A simplified example 18 is presented below. Figure 5 AAA FODOR MASS OBJECT OBJECT OBJECT DATABASE DATABASE DATABASE Functional Relationship Among ADBMS and DBMS the Actual The Tour-Guide Case Revisited Using the notation and the schemata realized in Figure 6 as a primitive FODOR object is shown simplified scenario of ADBMS. to respond (HCIS) object to the is Similarly, a partial representation of the These objects are used below in Figure 7. how in Figure 4, a Hotel-CIS query "what are the to present a facilities of Logan Airport Hilton" submitted by the user as discussed in the case study. At the conceptual Airport Hilton)" object given the is level, the executed to command "get-object 'HCIS 'facility get a composite-answer of facilities from the argument "Logan Airport Hilton." At the HCIS '(Logan level, the following commands •;end-mes^agi> FODOR ''ti!-\ic:i:r\ scnd-messa^e send-measa^e 'AAA ''lil-faciiit} 'MASS 'all -facility 19 are e.xecuted. ' Logan Airport Hiltom Logan Airport Hilton Logan Airport Hilton) > 'f HCIS IHCIS (NAME: (VALUE-TYPE string)) (ADDRESS, (VALUE-TYPE string)) (RATE-CODE: (VALUE-TYPE string)) (LODGL\G-TYPE: (VALUE-TYPE string)) (RATING (VALUE-TYPE stnngU (#-OF-UNITS iVALUE-TYPE integer)) (PHONED iVALUE-TYPE string) MULTIPLE-VALUE-FUNCTION ( (FODOR 'MESSAGE: (VALUE 'FACILITY: (QUERY all-facilitv)) Hnd-facility)) (CATEGORY: (CHOICES true)) super-deluxe, deluxe, expensive, (DIRECTION: (VALUE-TYPE stringU moderate, inexpensi\e) (FACILITY: (IF-NEEDED (QUERY find facility)) (CREDIT-CARD (VALUE-TYPE siring) (MULTIPLE- VALUE-FUNCTION true)) (SEASON-RATE (VALUE-TYPE string)' (MULTIPLE- VALUE-FUNCTION true)) (LOCATION: (VALUE-TYPE string') (COMMENT: (VALUE-TYPE: string!) (PACKAGE: (VALUE-TYPE stringi) (SERVICE: (IF-NEEDED Hnd-services))) Figure 6 , An Figure rind-category)l) A Partial Representation of the FODOR Object 7 Object Representation of the ComDosite-Model These messages trigger the AAA, return all facilities HCIS, it that they "know of." FODOR, and MASS database After ail objects to the facilities have been returned to reconciles all the difTerences from the local results, formulate a composite- answer, and then return the composite-answer to IFEP for the user. In reconciliating the differences, heuristic rules are used to identify a hotel or resolve contradictory information. We illustrate the FODOR message from HCIS, the 'facility case at the database object level. In response to the FODOR object executes '(Logan Airport Hilton)." This the command command ''^*?/-o6;c'cf 'FODOR triggers the following asynchronous events: • Establish the necessary connection and send queries to the to retrieve the facilities in • Execute the command to establish the FODOR-Facility shown "pei-o6_/tT? FODOR in Figure database 4. 'category '"^Logan Airport Hilton/" necessary connection and send queries 20 FODOR to the FODOR database the category of "Logan Airport Hilton." (The category turns out to to retrieve to • be "expensive.") command "send-message Execute the 'facility" to infer CATEGORY "expensive" additional facilities. 'FODOR -CATEGORY 'expensive The command triggers the FODOR- object to retrieve the facilities implied by the category bath or shower, (i.e., TV and phone room, in restaurant, and bar), and return the implied facilities to the • Concatenate the CATEGORY; facilities returned from the FODOR then return the concatenated results Finally at the actual retrieve the facilities in FODOR DBMS level, the to heating, A/C, FODOR object. database and FODOR- HCIS. subqueries are executed FODOR-Facility and the category in to FODOR-Info given "Logan Airport Hilton." The implementation of this prototype has provided us with experience and feedback on the effectiveness of using an object-oriented approach augmented with rule-based inference engine to handle semantic reconciliation, and an opportunity to sec how we may devise a layered approach to organize the existing knowledge and accumulate new knowledge, as Pappert's Principle suggested. We conclude with the following remarks. 4 Concluding Remarks The major problems that need to be overcome in order to integrate disparate databases for composite answers include contradiction, inconsistency, and ambiguity. We have presented an approach to resolve these problems through enhancing the semantic power of the database integration. This approach exploits concepts drawn from frame-based knowledge representation scheme and rule-based iriferencing. 21 We are actively researching connectivity in the data engineering context. Features crucial to connectivity in CIS are being designed and implemented on an AT&T 3B2 used machine under the to dial directly into feasibility of a UNIX environment. The enhanced version will be multiple on-line DBMSs simultaneously to demonstrate the system capable of attaining connectivity in a real environment with satisfactory performance. State-of-the-art approaches in formulating composite-answers are mostly application dependent and ad-hoc in nature. This research is intended to provide a theoretical foundation of connectivity in order to reconcile different assumptions perspectives due to different mental models be integrated. 22 embedded in the different and databases to References 1. Batini, C. Lenzerini, M. and Navathe. S.B. "A Comparative Analysis of Methodologies for Database Schema Integration," AC^I Computing Survevs Vol. 18, No. 4, December 1986, pp. 323 - 363. 2. . Frank, W.F., Madnick, S.E., and Wang, Y.R. "A Conceptual Model for Integrated Autonomous Processing: An International Bank's Experience with Large Databases," to appear in the 8th Annual International Conference on Information Svstems (ICIS) December 1987. , 3. Goldhirsch, D.. Landers, T., Rosenberg, R.. and Yedwab, L. "MULTIBASE: Svstem Administrator's Guide," Computer Corporation of America, November 1984. 4. Heimbigner. D. and Mcleod D. "A Federated Architecture for Information ACM Transactions on Office Information Systems Vol. 3, No. 3, July 1985, pp. 253-278. Management," 5. . ACM Hewitt, C. E. Office Are Open Systems. Transactions on Office Information Svstems Vol. 4, No. 3, (July 1986), pp. 271-287. . 6. Lam, C.Y. and Madnick, in information systems. Composite Information Systems - a new concept Sloan School of Management, MIT, May S.E.. CISR ''»VT# 35, 1978. 7. Levine, S., 'Interfacing Objects and Database," Master's Thesis, Engineering and Computer Science, MIT, May 1987. 8. Litwin, 9. Lyngbaek. P. and McLeod D.. "An Approach to Object Sharing in Distributed Database Systems," 9th International Conf. on VLDB October, 1983. W. and December 1986, Electrical "Multidatabase Interoperability." Computer Abdellatif, A. pp. 10-18. . . 10. Minsky, M. "A Framework for Representing Knowledge," in Readings in Knowledge Representation Bachman and Levesque (Eds.) Morgan Kaufman . Publishers. 1985. 11. Minsky, M. "The Society Theory of Thinking." Proceedings of the Fifth International Joint Conference on Artificial Intelligence Cambrdige, Mass., August 22-25, 1977. . New York, 1986 INFOPLEX database computer: 12. Minsky, M. The Society of mind Simon and Schuster, 13. Madnick, 14. Madnick, S.E. and Wang, Y.R. "Evolution Towards Strategic Applications of Databases Through Composite Information Systems." Working Paper #136687, Sloan School of Management, MIT, Submitted for publication. 15. Integrating Heterogeneous Distributed Smith, J.M. et al. "ML'LTIBASE Database Systems," National Computer Conference 1981. pp. 487-499. 16. Stefik. . S. E. and Wang, Y. R., Modeling the Proceedings of the 6-th a multiprocessor systems with unbalanced flows. Advanced Database Symposium (August 1986), pp. 85-93. - M. and Bobrow. D.G. "Object-Oriented Programming: Themes and Variations." The AI Magazine Winter 1936. Vol. 6. No. 4. pp. 40 62. - . 23 1313^09^ Date Due P. Lib-26-6'; MIT 1 lUliAKIES lllllllllllllllllllllllllllllllllllll 3 TOflD ODM T3b 172