Design of a Genetics Database for Gene Chips and the Human Genome Database by Benson Fu Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Bachelor of Science in Electrical Engineering and Computer Science and Masters of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY May 22, 2001 @ 2001 Massachusetts Institute of Technology All rights reserved .. A u tho r............................................................................................... Department of Electrical Engineering and Computer Science May 22, 2001 Certified by.................... ..................... r. .. 7*TForbes'Dewey, *Jr. Professor Thesis Supervisor ......................... Arthur C. Smith Chairman, Department Committee on Graduate Students A ccep ted b y ..................................................... .. BARKER OF TECHNOLOGY JUL 1 1 2001 LIBRARIES Design of a Genetics Database for Gene Chips and the Human Genome Database by Benson Fu Submitted to the Department of Electrical Engineering and Computer Science on May 22, 2001 in partial fulfillment of the requirements for the degree of Bachelor of Science in Electrical Engineering and Computer Science and Masters of Engineering in Electrical Engineering and Computer Science Abstract Human medical research has traditionally been limited to the analysis of disease symptoms. Although this research has produced many advancements in the medical field, the availability of human genetic sequence data will lead to further advances in diagnosis and treatment. With new sequencing technology and the near-completion of the Human Genome Project, the situation is rapidly changing. We have designed a database federation platform that manages gene chip experimental information and genetic data from the Genome Project. The combination of both sources will provide a The integration of powerful information system for medical research purposes. Affymetrix gene chip data and a schema of the Human Genome was used to test the design. Keywords: Human Genome Project, Affymetrix, GATC, genetic databases, gene chips, database federation, federating databases, query mediation, heterogenous databases Thesis Supervisor: Title: C. Forbes Dewey Professor of Medical Engineering and Bioengineering 2 Contents INTR O DU CTIO N ............................................................................................................. 5 TERM IN O LO G Y ............................................................................................................. 6 L BA CKG R O U N D ............................................................................................................ 7 A . Past Projects ................................................................................................................ 7 B. Current Projects .......................................................................................................... 8 C. The Problem at H and .................................................................................................. 9 Hum an Genom e D atabase ............................................................................................ 9 GATC D atabase ......................................................................................................... 10 Querying Both D atabases ........................................................................................... 10 11. DESIG N GO ALS ....................................................................................................... 12 111. TECHNOLOGY USED IN THE FEDERATION PLATFORM ......................... 15 A . Storage and Processing with a Local D atabase ........................................................ 15 Latency and Throughput ............................................................................................ 15 Storage M anagem ent and Scalability ......................................................................... 15 Efficient Query Processing ......................................................................................... 16 Future Benefits ........................................................................................................... 16 B. Interface and Transport w ith JDBC .......................................................................... 16 Simplicity and V ersatility ........................................................................................... 17 Object-Relational Support .......................................................................................... 18 ODBC Comparison .................................................................................................... 18 Current Im plem entation ............................................................................................. 18 C. ClassM apper Concept ............................................................................................... 19 IV. THE FEDERATION PLATFORM DESIGN ........................................................ 20 Starting the Federation Platform .................................................................................... 20 H ow a Query Is Structured ............................................................................................ 20 When a Query Is Subm itted ........................................................................................... 21 V . ARCH ITECTU RE ..................................................................................................... 23 ClassM apperRepository ................................................................................................. 23 DistributedQ uery (D ata Structure) ................................................................................ 24 QueryD ecom poser ......................................................................................................... 25 SQLQueryParser ............................................................................................................ 26 DBD elegator .................................................................................................................. 28 JDBCH andler ................................................................................................................. 30 V I. IM PLEMEN TA TIO N .............................................................................................. 32 ClassM aps ...................................................................................................................... 32 V II. DISCU SSIO N .......................................................................................................... 32 B u g s ............................................................................................................................... 3 2 M alfon-ned ClassM ap files ......................................................................................... 32 StringTokenizer Bug .................................................................................................. 32 Large D ata Sets .......................................................................................................... 32 Dropping Tables ......................................................................................................... 33 Future Improvem ents ..................................................................................................... 33 Threading capabilities ................................................................................................ 33 S ec u rity ....................................................................................................................... 3 3 Query Optim ization .................................................................................................... 33 Deploying the Federation Platform ............................................................................... 33 BIBLIO GR APHY ........................................................................................................... 34 APPENDIX ...................................................................................................................... 37 FederationPlatfonnjava ................................................................................................ 37 ClassM apRepository.j ava .............................................................................................. 38 ClassM ap.java ................................................................................................................ 41 DistributedQueryjava .................................................................................................... 43 SQLM onoDBQuery.java ............................................................................................... 46 SQLTableQuery.j'ava ..................................................................................................... 47 QueryD ecomposerjava ................................................................................................. 51 SQLQueryParserjava .................................................................................................... 55 DBDelegatorjava .......................................................................................................... 63 SQLJDBCHandlerjava ................................................................................................. 67 InfonnixJDBCHandlerjava ........................................................................................... 69 Introduction The Human Genome project has expanded the horizons of both the biological and medical communities, the latter of which is the ultimate consumer of these advances. Medical research into human diseases has been mostly based on the analysis of symptoms, and more recently, the use of genetic sequences. Until several years ago sequencing was a prohibitively expensive endeavor. With current technological advances and the huge push of the Human Genome Project, it appears that the relevant sections will be sequenced within the next year. This wealth of data can be used for medical research, but the raw data must be organized into a coherent schema-one which links it to relevant information. This thesis proposes an application design for handling Affymetrix Gene Chip databases and the Human Genome database (HGDB). This application can be used to access genetic data from a gene chip database and the Human Genome database as if both were combined into a single database. The application uses a query-mediated approach to create a database federation where both databases remain autonomous. The concept of the ClassMapper was implemented into the system to provide descriptions of the underlying databases. As a proof-of-concept, sample data contributed by the Sorger Lab at MIT was used. The creation of this program will allow researchers to link their experimental data with the information held within the Human Genome database. The Human Genome database is essentially a large distributed work-in-progress effort that acts as an "encyclopedia" for information about which genes are related, how they are related, where the gene is located, what research has been done for each gene, and other related information. Gene chips and DNA microarrays, on the other hand, are the commercial tools for high volume genetics testing of mRNA samples. The two are related in that they deal with genomic information. While one describes records of clinical DNA test the other describes biological behaviors of the DNA. To be able to leverage the information from both, a system must be able to seamlessly access the data contained in both databases. The key benefit of implementing a system that can interpret the data from gene chips in conjunction with the Human Genome Database is that cross-realm queries are then possible. In the case of Affymetrix gene chips, the results are output to a database containing experimental data. This database was designed for experiment analyses and thus contains limited information. The Human Genome Database was designed for accumulating and distributing genetic data. Being able to tie the two together would enable the user to make queries that allow data-mining across the two domains. This would be especially useful since it would allow the user to easily perform compound and complex queries to obtain information that is not contained in the gene chip database itself. This document investigates a database federation approach to enable cross-realm queries. An application, referenced as the federation platform in this document, was implemented as a proof-of-concept to handle the gene chip database and the HGDB. The federation platform was written in Java and several new technologies were implemented to align the application with its design goals that are mentioned later in this document. 5 Terminology This document uses certain terms that have different meanings from document to document. To clarify the way they are used later in this paper, the following terms are defined. Aggregate query - The query used after all of the data is aggregated on the local database. Database federation - A system that accesses heterogeneous databases into a loosely coupled manner. For most database federations, site autonomy is preserved. Distributed database - A system that accesses homogeneous databases in a tightly coupled manner. Distributed databases usually have limited site autonomy. Currently, several major database vendors support distributed databases. Data warehousing - The concept of storing information from data sources or databases into a central repository. This repository is then used as the access point for retrieving information. DBMS - DataBase Management System. DBPath - A qualifier used in the query syntax to indicate in which database the table resides. The syntax for a DBPath is [DatabaseName]->[TableName]. More information about DBPaths is mentioned later in this document. End-database - An individual database that is contained in the database federation. Federated query - A query sent to the federation platform that might access multiple databases. Federation Platform - The federated database system that this document describes. Local Database - The database on the server or intranet used to store tables from enddatabases. Multidatabase system - A system that programmatically accesses multiple databases. 6 I. Background Prior to designing the system, many multidatabase system designs were investigated. Pre-existing databases, existing software, existing hardware, user requirements, and bandwidth requirements are the determining factors in deciding which system is optimal. The table below shows the distinctions among different multidatabase systems. The classification of each system is based according to how closely the global system integrates with the local database management system. Global system has Local nodes Means of global access to ... typically are ... information Distributed database Internal DBMS functions Homogenous databases Global-schema multidatabase Federated database Multidatabase language system Homogeneous multidatabase DBMS user interface DBMS user interface DBMS user interface DBMS user interface + some Heterogeneous databases Heterogeneous databases Heterogeneous databases Homogenous databases Global name space; global schema Global schema language internal functions Type of system Tightly Coupled Loosely coupled system Interoperable system Application on top of the DBMS Any data source that meets the Partial global schemas Access language functions Access language functions Data exchange interface protocol Table 1. Taxonomy of information-sharing systems. After the investigation of multidatabase systems, it was decided that the federated database system design was to be used. While making it appear as if all of the enddatabases are merged into one, a database federation keeps each end-database autonomous such that they are affected as little as possible. In addition, the nature of the database federation allows for heterogeneity among its end-databases-an important benefit when dealing with biological databases. The concept of the ClassMapper [See Technology Used section later] was also investigated since its utilization aids in homogenizing various heterogeneous data sources. Since many biological databases are very heterogeneous, this was an important issue for the system design. Past and present projects in the field of federated database systems were researched. Specifically, the historical aspects of past projects and the designs of many current systems were studied. The tradeoffs for each system helped in determining the design of the system this document describes. The major projects that were most relevant to the problem at hand are discussed below. A. Past Projects During 1994, there were many ongoing projects involved in global-schema multidatabase systems and federated database projects. At the time, each global-schema multidatabase project was either in the research or prototype stage. Many of these projects have since vanished. Of the federated database projects that were in existence 7 during 1994, many seem to have disappeared as well. Two noteworthy examples are discussed below. Mermaid, a global-schema multidatabase prototype made by Unisys, showed great promise in the late 1980's [38]. Mermaid's hope was to become a front-end to distributed heterogeneous databases. The plans were to allow the user of multiple databases stored under various relational DBMS's to manipulate data using SQL or ARIEL (ARIEL is a proprietary query language). The complexity of the distributed, heterogeneous data processing was to be transparent to the user. Mermaid's main emphasis was in query-processing performance: the internal language DIL (Distributed Intermediate Language) was optimized for interdatabase processing. Mermaid evolved into Inter Viso, a commercial product sold by Data Integration, Inc. Ultimately the commercial product was discontinued and little information is known about its last developments. Pegasus was designed as a federated object-oriented multidatabase at the HewlettPackard Laboratories [38]. The attempt was to become a full DMBS that could integrate heterogeneous, remote databases. The hope was to have global users add remote schemas to be imported into the Pegasus database, thus making it a dynamic federated database. Nonobject-oriented schemas were mapped to object-oriented representations within the global database. The global access language HOSQL (Heterogeneous Object SQL) had features of a multidatabase language system; however, local users were responsible for integrating imported schemas. Although the system gained quite a bit of publicity, Hewlett-Packard eventually discontinued its work on Pegasus. There is little documentation as to why HP stopped further development, but it is known that its publications ceased in 1993 when the system was still in its research phase. B. Current Projects One can only speculated as to why the former systems stopped being developed. Perhaps it was more difficult than the companies first anticipated to create a generalized federated system. It is also possible that the companies just shifted their focus away from federated database systems. Regardless, the problem to conquer database heterogeneity still exists today. Current federated database systems in the biological realm are still being developed. Several ongoing projects that are tackling the same type of problem that the federation platform is dealing with are as follows. One tool was built by several researchers at the University of Pennsylvania in Philadelphia [41]. These researchers in the Kleisli Project built the tool that allowed scientists to use a single query interface to compare their data against a variety of collections. Kleisli currently is a tool for the broad-scale integration of databanks that supposedly offers "flexible access to biological sources that are highly heterogeneous, geographically scattered, highly complex, constantly evolving, and high in volume". The tool does handle a wide variety of data sources but at the expense of ease-of-use. Since the system was meant to handle nearly any type of data source, the query language of the system is very complicated and difficult to use. The system overcomes heterogeneity by expanding the language set for each different data source. Obviously, the language set becomes more complicated as more data sources are supported. This is where the ClassMapper concept could come into play to condense the query language by homogenizing the databases or data sources. As a last note, the Kleisli product was 8 continued into the commercial world as the product named "gX-Engine" that is now owned by the company GeneticXchange Inc. [42]. Another company, LION Bioscience AG in Heidelberg, Germany, markets a tool called SRS [43]. This tool helps the integration of databases for many pharmaceutical firms. The system handles quite a variety of databases, however, attaching an new database requires a decent amount of work. Each database type added to the system must have a specialized interface that must be programmed into SRS. Again, to overcome this non-generalizable approach is where the ClassMapper concept would come into use. Transforming a number of heterogeneous databases into a set of homogeneous databases would allow the federated system to manage its data without the hassle of being limited by the interfaces of the individual databases. Having a ClassMapper for each database could provide this homogeneity. MARGBench is a system which enables querying several databases in SQL by translating SQL queries into a source database specific interface [24] [44]. Developed at the Otto-von-Guericke-University in Magdeburg, Germany, MARGBench is a database federation that simultaneously queries biological source databases online. The architecture of the system is similar to the architecture of the federation platform in many ways. MARGBench is a database federation that has a SQL interface, uses the concept of Adapters instead of Handlers (mentioned in the paper), takes advantage of JDBC connections to end-databases, and is able to make cross-realm queries across a number of heterogeneous databases. The system even has a local database to handle caching of the data. Where MARGBench and the federation platform differ is in the way the enddatabase table information is revealed. In the federation platform, ClassMaps reveal the table information of the end-databases before queries are handled. In MARGBench, the concept of an ontology is used. The ontology is effectively a list of connections that link the data between the end-databases. While this concept is useful in connecting data, it does not help to overcome heterogeneity issues. Custom-made adapters must still be built for each type of database in its federation. Similar to the federation systems previously mentioned, this non-generalizable approach does not scale well when there is a large amount of heterogeneity. For the federation platform, the evolution of the ClassMapper will eventually consolidate database communication into a single interface no matter how heterogeneous the underlying databases are. C. The Problem at Hand Human Genome Database The Human Genome Database (HGDB) has literally terabytes of information that includes genetic sequences and related metadata. When the Human Genome database was first designed, it made sense to order the data in an object-oriented fashion. Because the nature of the data had fixed associations such that genome data could be treated as objects, the system was built to handle genomic segments as objects that contained names, descriptions, and associated links to other objects. In addition, the requirement of managing such large amounts of data lent itself to an object-oriented design which is relatively scalable. The information contained in the Human Genome Database can be broken down into three main object types. The types are as follows: 9 * Regions of the human genome, including genes, clones, amplimers (PCR markers), breakpoints, cytogenetic markers, fragile sites, ESTs, syndromic regions, contigs and repeats. " Maps of the human genome, including cytogenetic maps, linkage maps, radiation hybrid maps, content contig maps, and integrated maps. These maps can be displayed graphically via the Web. * Variations within the human genome including mutations and polymorphisms, plus allele frequency data. The database contains a huge wealth of information that can be used to reveal large amounts of genomic information about a gene sequence or gene fragment. However, the interface to the database is designed for inserting and extracting the data, not for data mining or complex querying. Thus the information contained in the Human Genome Database is not being fully utilized to its potential. William Chuang's work with the HGDB was used as an example database in the federation platform. In his project, an object-relational implementation and ClassMap for the HGDB was created. The schema was implemented as an end-database in the federation and the ClassMap was extended to also describe connectivity information. GA TC Database The companies Affymetrix and Molecular Dynamics teamed up to form the Genetic Analysis Technology Consortium (GATC) to build a platform to design, process, read and analyze DNA-chip arrays [10]. One product that came out of the GATC was a specification for a database to handle the data of DNA-chip experiments. The information contained in the GATC databases are recorded intensities that correspond to the amount of targeted DNA that the sample has. These DNA targets correspond to specific DNA segments that are characterized by certain biological behaviors. For more information about how DNA-chip arrays work, see [45]. The GATC database specification has a basic relational architecture that is geared to store experimental data. In William Chuang's work, an object-relational implementation of the GATC database was created. It was decided that the database needed to be object-relational to give researchers the ability to easily import their experimental data directly into the new ORDBMS without requiring additional messaging. William Chuang's GATC database was used as an additional example database in the federation platform. Its schema was implemented as an end-database in the federation and the ClassMap was extended to also describe connectivity information. Querying Both Databases To truly leverage the experimental data in the GATC database and the genomic characteristic information in the HGDB, the two must be utilized together. The DNA identifiers (called AccessionID's) in the GATC database correspond to DNA segments with genomic characteristics. However, these characteristics are stored in the tables and connections of the HGDB. In order to associate a particular DNA segment from an experiment with its genomic characteristics, both databases must be used in conjunction 10 with each other. The problem then becomes the task of querying data across two database domains. Merging the two databases into a feasible solution to the problem, but this is not necessarily the best route. Updating the table information from both sources can be a cumbersome task especially if there is no mechanism to tell whether a table needs to be updated. Plus, the HGDB literally contains terabytes of information. Managing massive amounts of data could be a challenging task in and of itself. The federation platform that this document describes is a solution to this problem. The two databases are left autonomous as end-databases of the system. The information retrieved from both are performed ad hoc so that the data are fresh. The system also allows cross-queries across both domains without having to merge the two schemas. The next section describes the design of the federation platform. 11 II. Design Goals The architecture of the system was designed to be a federated platform with the following design goals in mind. Data Freshness Since biological databases are constantly being updated, having the most up-todate information is often times important to the work of the researcher. Working against old data can sometimes mean the difference between a success and a failure of an experiment. As mentioned before, data warehousing has many of its own advantages, but it lacks freshness from its databases since its data fetching is not performed when tables are accessed by the system. In the proposed architecture, the user of the system is guaranteed fresh data since all of the data fetching is performed on an ad hoc basis. The tradeoff of this design goal is that if a database in the group is down, the query will fail [See Figure 2]. In addition, without optimizations, the guarantee of fresh data comes with the sacrifice of speed, especially if the network connection is at a low speed. Site Autonomy Many of the existing biological databases were designed to serve the purpose of receiving and hosting biological information. For nearly all of these databases, the database structures were not intended to be changed. Thus, to modify the underlying structure of each database, a great amount of work would have to be done. What the federation platform allows is site autonomy of existing databases. That is, the platform does not require modifications to the end-databases for them to be used in the system. The federation platform only requires that it have query access to the end-databases. In addition to not touching the underlying structure, the database federation requires no special maintenance at its end-databases. This is especially useful since most biological databases are maintained by specialized groups that do not have the time or resources to make modifications for non-critical components of their system. Again, the federation architecture allows for site autonomy that is often times required for adding certain databases. Flexibility/Expandability To be able to handle additional end-databases with different means of connectivity or querying interfaces, the system must have a flexible and expandable architecture. This thesis seeks a partial solution with an expandable architecture. Since the architecture was designed so that a new handler object is instantiated for each database registered in the federation, the system can simultaneously use different database interfaces. This means that as new database interfaces are created for the platform, old ones do not have to be upgraded or sacrificed since all can be used concurrently. This functionality allows future support for a large variety of databases in the database federation. Ideally, the system will increase its expandability with the evolution of the ClassMapper concept. The current system utilizes the ClassMapper concept with ClassMaps of each database. As mentioned in the Technology Used section, the 12 ClassMapper concept hopes to reduce heterogeneity by providing the database or data source with a homogeneous presentation to the outside world. An evolved ClassMapper would provide this to allow universal flexibility and expandability for practically all systems that access multiple databases. Scalability In order to completely leverage the power of a database federation, the system must be able to support many databases simultaneously. If it is the case that only a small number of databases can be queried against during a single federated query, then the utility of the system drops dramatically. Especially in the realm of biological research, multiple databases must be used at the same time or otherwise data could be incomplete. The design of the federation platform theoretically scales to an unlimited number of enddatabases. This is primarily because of how the local database is used. As is mentioned later in the thesis [See The Federation Platform Design], when a federated query is submitted, the federation platform copies the vital data of the accessed tables to local database. This is performed one table at a time until all required table information is transferred to the local database. Once all of the information is aggregated, the federation platform finally runs a query on the local database. The results returned are returned to the user. In some sense, the local database effectively acts as a buffer between the enddatabases and the user [See Figure 2]. By looking at the system in this framework, the table information from the end-databases is collected in local database until all of the required data is transferred. Each transaction of copying a partial table from an enddatabase to the local database is done separately so that transfers do not consume large amounts of system resources. The system can delegate how much resources can be used for copying to ensure that the system does not become overloaded. Even if there is a large number of tables that need to be copied, the system can transfer the data as fast or as slow as it possibly can, depending on the amount of system resources available. Once all of the tables are inserted into the local database, the local database can be queried for the results to be returned back to the user. This approach allows the system to scale to a relatively limitless number of table transfers since the local database is used as a buffer for table information. The tables are added piece by piece into the local database until all of the tables are collected. In addition, since the platform can regulate the transfers by system resources, the size of transactions does not matter. Ultimately, this design allows the system to scale as more tables and databases are added to the database federation. Transparency Transparency of the accesses to end-databases is important because the potential complexity of the database federation. Since the users of the system could get confused handling the database operations, all database accesses were managed by the federation platform. The targeted users of the system are researchers who may have little or no experience with database management. By hiding this from the users, the database federation appears to act as one large, single database to the user. This transparency adds to the ease-of-use of the entire system. 13 In addition to the added ease-of-use benefit, hiding the transactions of the underlying databases increases the overall security of the system. If the transactions of the end-database were observable, potential hackers could trace extra data about each end-database. This extra data could contain information that could help the hacker discover the location of the database and exploit holes in the system. This transparency helps to avoid security problems by removing the user from the entire transaction process. Portability Since the federation platform was written in Java and uses JDBC for connectivity to the local database, it can be run on any up-to-date Java VM. With the support of Java VMs on MacOS, many flavors of Unix, and Windows operating systems, the platform can be run on a wide variety of machines with practically no modifications to the code. In addition, JDBC is platform neutral as well, thus requiring no main connectivity changes in the code. Although JDBC allows the server and application to be on different platforms, Informix is also written for a variety of operating systems as well. This also increases portability of the system since even the local database can easily be ported to different machines. Usability Because many of the database federation system attempt to accommodate for so many heterogeneous data sources, the querying language for the system is difficult to learn and use. Because researchers who use biological databases do not usually come from a strong computer science background, they find learning a new computer language often times very daunting and difficult. This federation platform is rather useable when compared against other systems that use a cumbersome querying language. This is because queries in the federation platform are similar to standard SQL queries in that they follow a "SELECT-FROMWHERE" clause format. By using queries similar to SQL, users who already know SQL can immediately begin using the federation platform since they are familiar with how queries are formed. For those users who are not familiar with the SQL querying language, it can be learned very quickly since it has a low learning curve. Overall, the design decision to use this querying language format makes the system more useable to those who are unfamiliar with standard SQL as well as those who are. 14 III. Technology Used in the Federation Platform A. Storage and Processing with a Local Database To utilize the functionality built into database management systems, a decision was made to implement a localized database in the system. This was done for several improvements in design and performance. They are as follows: Latency and Throughput By storing the table information on a system near the federated platform, the table information can be obtained with a low latency (low access time) and a high throughput (large bandwidth). Ideally, the database will run on the same server as the federation platform so that network interfaces will not hinder the overall performance of the system. However, even if the database resides on another server in the intranet, current network speeds of 1 ObaseT or 1 00baseT (a maximum of 1.2MB/sec and 12MB/sec theoretical throughputs, respectively) are adequate to serve the information flowing between the database and the platform effectively. If caching of the tables is used in future implementations, then the federation platform must access the data multiple times. In order to make the remote database tables available to the system for multiple accesses, the data needs to be stored in a location where it can be accessed quickly. Accesses to a "local" copy of the tables will reduce the amount of time it takes to process a federated query since the data would not need to be fetched again. For a caching system in the future, it would be ideal to cache large amounts of data in the local database and have the federation platform check its freshness before it is retrieved from the cache. This may be necessary, especially for the Human Genome Database and other large databases, where the lack of a caching component would potentially require gigabyte-sized fetches. Storage Management and Scalability The fetched table data from separate end-databases must be stored before they are processed and sent back to the user. If the tables are stored as JDBC Java objects, large amounts of memory are consumed unless the objects are written to disk. However, even if the tables are written to disk, to efficiently process the table information, all table objects must be loaded into the memory of the system. Thus, using a local database helps to overcome these problems in managing the database information. Utilizing a local database simplifies the information management and efficiently handles storage since databases are built with these goals in mind. Storage scalability is another advantage that comes when using a local database. The databases and system resources of today can handle information sizes that are on the order of many gigabytes. Storing large amounts of data from end-database tables is not a problem with a local database. If the user of the federation platform anticipates that a query will require huge amounts of table data to be accessed, then the space of the local database can be adjusted accordingly. In addition, the local database provides the added benefits of security, recovery, and data integrity that are already built into the Database Management System (DBMS). Although these features are not part of the design goals, they could come into use for future goals of the federation platform. 15 Efficient Query Processing Before results can be sent back to the user, tables from separate end-database must be fetched and processed according to the conditions of the query. Since it is partial table results that are returned from the end-databases, it seems natural to store these tables in a database and process the information from there. Building a module that could process the data based on the SQL conditions of a federated query would basically be reinventing the query engine of a standard database management system. Therefore, it was decided that it would be more efficient to use the query engine of a database instead of a self-made data processing module. The most practical way of using a pre-existing query engine was to implement a local database into the system. By adopting the query engine of the local database, the processing capabilities of the federation platform became as scalable and efficient as the local database itself. Future Benefits To deal with more complex queries in the future, the federation platform can use the built-in functionality of the local database to aid in processing. For instance, many databases have object-relational schemas. To be able to have these schemas be supported by the federation platform, the system must be able to handle obj ect-relational processing. In future versions, the system could be tailored to support object-relational operations by using the preexisting processing capabilities of the local (object-relational) database. With the local database, caching of table data is a feature that could be added to the federation platform. Table data fetched from end-databases could be stored in the local database and reused until the information expires. The caching scheme would have to incorporate a time stamp to calculate the "freshness" of the data in the tables since there would be no guarantee that the cached data would be up-to-date. Further details would be determined later when this functionality is implemented into the system. Regardless, having the local database could tremendously reduce the amount of code needed to add a caching feature. B. Interface and Transport with JDBC Java DataBase Connectivity (JDBC) has recently emerged as a growing standard for database connectivity. With the explosion in adopters of Java, Java-based standards have emerged. What JDBC provides Java developers is a standard API that is used to access databases, regardless of the driver and database product. That means that any Java application can connect to nearly any database no matter what platform the application runs on or where the database resides. This is possible in part by the acceptance of the JDBC standard by both Sun, the creators of Java, and all major database vendors including Oracle, Informix, and IBM. This portability makes JDBC ideal for applications that need to access databases over networks. For the design of this system, JDBC is used for database connectivity for the end-databases as well as the local database. This decision was based on the following advantages that JDBC provides: 16 Fgr 1. cle t driver o nriJDBC Simplicity Sever internet Client ed: authentication corFr b eto ttheh onnc Aatio databas s JDBC support Figure 1. A client connecting to a database via JDBC. Simplicity and Versatility A large amount of value is gained by using JDBC because of the simplicity of data access and data manipulation. In JDBC, the lower level connectivity layers are hidden from the developer to allow easier data access. The user has to only specify a valid TCP/IP location of the server running the database and the correct authentication for a connection. Figure 1 above demonstrates how a client machine connects to a database via JDBC. During JDBC operations, database queries are returned as ResultSet Java objects from the java.sql package. A single ResultSet object contains the complete information that is normally returned from the query. The data contained in the object can then be accessed programmatically by traversing through the rows and columns. Field types and values are easily extracted and turned into Java objects or primitives through the standard methods in the ResultSet object. JDBC allows scrolling through the ResultSet object so that the programmer can quickly jump to any row, column, or field. Even inserting data is made easy by using methods in the package that allow for inserting rows programmatically (as opposed to sending database-specific text insert statement to the database). Support for batch updates is another key feature that makes JDBC very attractive. 17 Object-RelationalSupport JDBC also goes beyond relational processing. The movement for better objectrelational databases connectivity has pushed JDBC to support object-relational features. The support for various data types already exists with limitations on handling large objects, however, the scalability and the expanding number of supported data types will make JDBC the preferred approach to handling object-relational data accesses. As mentioned in previous sections, strong object-relational support is key for biological databases since much of the information associated is metadata that must be stored as objects. JDBC's continuing support to move this direction will ultimately help in being a tool to manage biological information. ODBC Comparison Currently, other forms of connectivity such as Open DataBase Connectivity (ODBC) and ODBC <-> JDBC bridges still exist since they are connectivity standards still used today. However, the major database vendors have recognized the growing demand for Java applications and have shifted their focus from developing ODBC to JDBC. With this progression into a more Java-centric slant, ODBC is slowly getting outdated. Since ODBC is accessed via C or C++, client software must be written in C or C++. While this is appropriate for many scenarios (where the point of database access is a C or C++ program), to use Java with a database that only connects via ODBC, an ODBC <-> JDBC bridge has to be used. While this conversion mechanism works for most cases, the bridge increases the number of inter-operating parts and potential sources of failure for the system. When a Java program is being used to communicate with a database, it is best to use pure JDBC. CurrentImplementation Informix uses a type 4 JDBC driver. A type 4 driver is a pure Java driver that uses a native protocol to convert JDBC calls into the database server network protocol. Using this type of driver, the Java application can make direct calls from a Java client to the database. A type 4 driver, such as Informix JDBC Driver, is typically offered by the database vendor. Because the driver is written purely in Java, it requires no configuration on the client machine other than telling the application where to find the driver. Once the driver is loaded, the application can access the database via the JDBC interfaces. In the Federation platform implementation proposed in this document, JDBC database adapters were constructed as part of the platform. The reading adapter interface that was created fetches database information via JDBC and passes the table information back to the platform. For the local database that is located on the intranet (for low latency and high bandwidth), an additional JDBC adapter was made with database writing capabilities. Using JDBC for reading from the database federation and writing to the intranet database allows for very clean and compact code for table transfers. (See Appendices SQLJDBCHandler.javaand InformixJDBCHandler.java). 18 C. ClassMapper Concept In the world of biological databases, the databases and data sources are very heterogeneous. The way the systems are accessed range in different interface types and their underlying database structures vary greatly as well. A query for one database may have a completely different syntax or semantic structure than a query for another. Many of these biological databases have query languages that were designed without the goal of using pre-existing semantics. Therefore, many of these databases have very different interfaces. In order to access a variety of these databases, the user must learn how to use each one of them. In addition, if a programmer wishes to build an application that accesses the databases, he must build a special interface for each heterogeneous database. Thus, there is a need for standards when it comes to biological data sources [22]. Patrick McCormick's document [2] details the concept of a ClassMapper. The main motivation for this concept is to conquer the heterogeneity of data sources that is so prevalent across medical and biological databases. A ClassMapper is an application that "sits on top" of a database (or data source) to standardize its presentation to the outside world. All communication between the ClassMapper and the database are hidden since the ClassMapper serves all information requests from the user. The added benefit is that the user can interact with each ClassMapper in the same way since the interface is standardized across every ClassMapper. Therefore, in some sense, each database "looks the same" to the user since obtaining information is performed in the same manner. With a ClassMapper residing on each database, all appear to be homogeneous. This concept is still being refined; however, it is apparent that standards need to be put in place to overcome heterogeneity. The concept of the ClassMapper was used as part of the federation platform. Since the standards for ClassMappers are still undefined for the most part, descriptions of the HGDB and the GATC database were used. These descriptions are called ClassMaps since they are standardized descriptions of the databases. These ClassMaps were obtained from William Chuang's work described in [21]. In this implementation, the ClassMaps were extended to include connectivity information to the end-databases. They were used by the federation platform to build a map of the tables contained in each database. 19 IV. The Federation Platform Design Starting the Federation Platform Before the system can be used to query across the system, the accessible databases must be properly registered with the federation platform. For each database in the federation, the ClassMap must be registered. The ClassMaps in the current implementation contain not only the table information about the databases, but also the network location and authentication keys. These ClassMaps are stored as local files on the server running the federation platform and are automatically loaded when the application is loaded. Once the platform is loaded, it is ready to accept queries from the user. How a Query Is Structured A distributed query is similar to the standard SQL format [39]. The queries are structured with "SELECT", "FROM" and "WHERE" clauses. These clauses must be ordered correctly or else the system will not function properly. The order must start with "SELECT", followed by "FROM" and then followed by "WHERE". This restriction in clause order is similar to the rules in standard SQL. In standard SQL, the column or columns specified in the "SELECT" clause are the columns that will return their results to the user. The "FROM" clause contains the table or the list of tables that are accessed by the query. If another clause in the query attempts to access a table that isn't explicitly declared in the "FROM" clause, the query is not processed. Therefore, all tables used in a query must be declared in the "FROM" clause. The "WHERE" clause contains an optional list of conditions to restrict the information returned back to the user. These conditions can be set as equalities or inequalities, comparing columns against values or columns against other columns. See [39] for more details about SQL. In the federation platform, SQL queries are structured in practically the same way as standard SQL. A federated query has "SELECT", "FROM", and "WHERE" clauses that must be placed in the same order as a standard SQL query. Since all tables in the database federation are registered when the ClassMaps are loaded, when a table is referenced in any of the clauses, the federation platform knows if the table exists and on which database the table resides. Therefore, the user needs only to specify table names and columns in the clauses-the platform takes care of the rest. By hiding to the user where the table is located, the user can observe the database federation as one large, single database. The user query can then query against the federation as if it had all of the end-databases combined into one database. This functionality meets the design goal of creating transparency for the user. Within the federation platform, the query is transformed internally into a query called a DBPath query. This type of query contains end-database names as prefixes to each table in the form [DatabaseName] + [TableName] or [DatabaseName] [TableName].[ColumnName] if a column is specified. This syntax makes explicit references to specific end-database instead of relying on ClassMaps. The system has the capability to accept DBPath queries directly from the user if he chooses to use this format. This feature is useful when the user wants to specify exactly where the 20 table is being retrieved. More details about the DBPath format are explained in the Architecture section. When a Query Is Submitted After a query is passed into the federation platform, the text of the query is passed into a decomposer module. The query is decomposed into end-database queries based on rules that are coded into the platform. In several other federated database systems such as those in [24] [26] [43], the systems decompose queries according to rules that are stored in a separate knowledge base that is adjacent to the system. These systems have their rules detached from the main system and read in by a module before processing any queries. By having a rule reader as part of the main implementation, this reduces the amount of work needed to be done when upgrading the rule set. Plus, this makes it easier to read how the system decomposes queries for users that are not familiar with the source code. However, to build a rule reader module, a flexible and upgradeable syntax must be created. Building this module will also take a considerable amount of extra work beyond building a single module with the rules hard-coded. This separation of rules from the main system should be investigated in the future to see whether this design decision is appropriate for the federation platform. The logic of the current implementation is to copy table information from the enddatabases into the local database. The federation platform does this systematically by parsing the federated query to determine each table that is accessed by the query. Once the all of the tables are determined, the platform constructs end-database queries to retrieve those tables. Several optimizations are put into the end-database queries to download only certain parts of the tables. Specifically, the "WHERE" conditions are inserted into certain end-database queries when possible to narrow down the information returned by the end-databases. This reduction in data transfer decreases the amount of time it takes to retrieve all of the table information from the databases for the federated query. Once all of the table information is retrieved, it is programmatically inserted into temporary tables on the local database. The original federated query is transformed into a query that is usable by the local database to query across all of the new tables. This query is called the "aggregate query" since it is performed after all of the table information is aggregatedon the local database. From there, the federated architecture lets the local database handle the query processing for the federated query. The results from the local database are then returned to the user. The results that are sent back to the user appear as if they came from a single database that contains all of the combined information across the entire database federation. Once the results are sent back to the user, the local database no longer needs to store the retrieved table information. The tables are subsequently dropped to make sure the local database is not congested with old table data. This completes the federation platform's execution of a federated query. The following figure demonstrates the transactions that occur for a federated query. 21 Federated Results 0 D 0D Federated Query (String) Federation Platform 0D I Local Database uery %t:) Queries for Tables Ptial Queries for Tables oGATC C[TABLE TABL Figure 2. The transactions during a federated query. 22 HGDB 1 TABLE C V. Architecture ClassMapperRepository Before the federation platform can be used, the platform must have access to the ClassMaps of each respective database in the federation. As mentioned earlier, the ClassMaps provide a high level description of the database and the means of connectivity to the database. As the architecture stands, the ClassMaps are read from local files on the same machine as the application, but it has the capability to read the ClassMaps from practically any source whether it be passed as Java String object or fetched from a remote server. This ability comes from Java's built-in connectivity libraries. When the ClassMaps are loaded into the ClassMapRepositoryobject, the header that contains the connectivity information is first read. The header is written such that the federation platform knows everything it needs to connect to the database. An example of one of the headers is as follows: ----- ---- -- CLASSMAPPER INFO -DATABASEALIAS: GATC -CONNECTIVITY: InformixJDBC -DATABASEIP: 18.999.0.156 -PORT: 1013 -DATABASENAME: gatc -ADDITIONALPARAMETERS: INFORMIXSERVER=BENFU AUTHENTICATION(user) : informix -AUTHENTICATION(password): F43m2#lm.y -- Figure 3. An example connectivity header for the GATC ClassMap. After the header information is read, the repository continues to read the file and begins associating tables with its database name. The table names are stored with their corresponding database names in a hash table. In a hash table, the structure holds a list of keys with one corresponding value object per key. Hash tables are used for looking up values based on key values. The structure does not allow there to be identical keys so that it is guaranteed to have only one value for a single key. An identical value, however, can be associated with multiple keys. The structure for a hash table is set up in such a way that key-value lookups are linear in time [46]. That is, as the list of values grow, the hash table lookup times grow incrementally (with time increments that stay the same size). Another added benefit is that memory space also grows incrementally as more values are added. The linear growth behavior makes hash tables ideal for scaling fast lookups on large data sets. Since one of the design goals was to scale to a high number of databases, the use of a hash table was natural. In the ClassMapRepositoryobject, the designated hash table is used for fast lookups of table-to-database mappings and conflict notification in the event that more than one database has the same table name. As each ClassMap is loaded, it adds each table to the hash table and associates it with a mapping to the database that contains it. ClassMap objects are instantiated per ClassMap loaded to hold its connection and authentication information. This continues until all tables are mapped from each of the processed ClassMaps. 23 Once all of the tables are processed into the ClassMapRepository,the system moves on to constructing a DistributedQueryobject. The repository is retained within the federated platform since it is referenced several times to find the mappings between tables during the execution of the federated query. However, the ClassMapRepository object is not modified later-it is only accessed to pull out database-table mappings. DistributedQuery (Data Structure) Once the repository has been established, the platform begins to construct its main data structure, a DistributedQueryobject. At the creation of the object, only the federated query (in String form) is set in the object. In addition, the DistributedQuery data structure is used throughout the architecture to store the ClassMapRepositoryobject, the transformed federated queries, and the decomposed database queries. Detailed descriptions of when the different members are used will be mentioned later in the following sections. The structure of the DistributedQueryobject can be seen in the following figure: DistributedQuery queryString dbPathQueryString aggregateQuery classMapRespository monoDBHashtable SQLMonoDBQuey SQLMonoDBQuery databaseName databaseName tableName tableName SQLMonoDBQuery tableQueryHashtable table~ueryHashtable SQLTableQuery SQLTableQuery SQLTableQuery SOLTableQuery tableName dbName tableName dbName tableName dbName selectVector fromVector selectVector fromVector selectVector fromVector SQLTableQuery whereVector whereVector whereVector SOLTableQuery Figure 4. DistributedQuerydata structure object. The DistributedQueryobject contains three String versions of the query: the original federated query, the federated query mapped with its database paths (DBPath), and the "aggregate query" or the final query that is used against the local database. The different modules in the system that set these last two members are mentioned later in 24 this section. In addition to the transformed queries, the DistributedQueryobject contains an object oriented structure for decomposed queries. It was decided that for an acceptable object oriented design, the hash table data structure would be used. As mentioned above in the ClassMapRespositorysection, using a hash table allows the system to scale appropriately as more values are added. This scalability is applicable to decomposed queries since in theory, as more databases are connected to the federation, the number and complexity of the decomposed queries will grow. In designing how the data structures would be used in the system, it was determined that each table query should be its own object. Since each table accessed by a federated query has to be partially reconstructed on the local database as a new table, it seemed logical to handle each transaction on a "per-Table" basis. Thus, the SQL TableQuery object was used to encapsulated this abstraction. Each SQL TableQuery object contains a list of SELECT, FROM, and WHERE arguments as well as methods to modify the lists. The current state of the architecture only accepts these three SQL clauses. However, since the design of the system is modulated such that each clause is stored as a list of arguments, adding more SQL vocabulary to the SQL TableQuery only requires adding another list of arguments. This design provides relatively quick data structure upgrades for expanding the querying capabilities to the end-databases in the federation. In the event of an upgrade, more logic will have to be programmed into the query processor(s) of the architecture to accommodate for the expanded features in the SQL vocabulary. Moving to a higher level of abstraction is the SQLMonoDBQuery object. Each end-database in the database federation is designated a SQLMonoDBQuery object which contains the all of the SQL TableQuery objects used to query against that specific database. Grouping queries by database allows for easier management, navigation, and lookup of queries. Every SQLMonoDBQuery object stores each of its SQL TableQuery objects as values in a hash table with the corresponding key being the table name retrieved. By using the table name as a key to the hash table, this allows for fast lookup of the SQL TableQuery object that is used to retrieve the file. Moving higher up in the abstraction level brings us to the DistributedQuery object. Similar to the SQLMonoDBQuery-to-SQLTableQuery relationship, the DistributedQueryobject contains a hash table with database names as the keys and the SQLMonoDBQuery objects as the values. This again allows for fast lookups of SQLMonoDBQuery objects. By using this design for the data structure, the federation platform can quickly access all SQL TableQuery objects with logical groupings. As will be seen later in the explanation of the architecture, parts of the federation platform utilize these groupings to simplify the end-database query processing. QueryDecomposer After the ClassMaps have been processed, the system is ready to accept federated queries. When the client submits a federated query, the platform begins by first instantiating a QueryDecomposerobject and passing the query (in String form) to it. The QueryDecomposer object then constructs a DistributedQueryobject and immediately 25 stores the federated query into the data structure. From this point on, only the DistributedQueryis passed between the different modules. Before the DistributedQueryobject is passed to the next module, the QueryDecomposer makes a transformed copy of the federated query and stores this in the DistributedQueryobject. More specifically, the decomposer object transforms the query such that each table is prefixed with a path to its respective database (DBPath query). The DBPath prefix is just the database followed by the "dash-greater-than" characters that look similar to an arrow. For each table referenced in the query statement, the QueryDecomposerperforms a lookup on the ClassMapRepositoryto determine which database the table belongs to. As mentioned in the previous section, transforms for each table have the form of [DatabaseName] 4 [TableName] or [DatabaseName] + [TableName]. [ColumnName] in the query with DBPaths. For example, the table CHIPDESIGN stored in the GATC database would be transformed into GATC4CHIP_DESIGN in the new query. If the query is referencing a specific column in the database, the prefix stays the same. Therefore, the reference to the column CHIP_DESIGN.NAME in the GATC database would turn into GATC4 CHIPDESIGN.NAME in the DBPath query. If the transformation is successful, the DBPath query is stored in the DistributedQueryobject. In the event that a table cannot be found in the ClassMapRepository,the table name in the DBPath query is replaced with the string [NOT IN CLASSMAP]. Malformed references are also flagged with a [MALFORMED] string. This allows the user of the platform to identify syntax and spelling mistakes for references in the query. SQLQueryParser After the QueryDecomposerhas finished inserting the DBPath query into the DistributedQueryobject, it passes the data structure to an instantiated SQLQueryParser object. The SQLQueryParserbegins the task of parsing the DBPath query. The decision was made to have the QueryDecomposer object parse the DBPath query for two reasons: 1) to decrease the dependency on the specifications ClassMapRepositoryby classes in the federation platform and 2) to give the user the option to manually query the federated platform with a DBPath query instead of a normal query. The first reason was for a "cleaner" design while the second was for access versatility for the user of the system. Once the SQLQueryParserreceives the DistributedQuerycontaining the DBPath query, the object runs through several steps to break apart the query into end-database queries. The SQLQueryParserobject first breaks apart the DBPath query by its SQL clauses. In the current implementation, the SELECT, FROM and WHERE clauses are separated. Once the clauses are separated, table names are extracted from each clause. If a DBPath for a table or column is malformed in a query, the class recognizes the syntax errors and prints the errors to the system console. The table names from the FROM clause are first extracted since in proper SQL queries, all of the tables accessed by the query must be given in the FROM clause. All tables accessed by the query must be specified in the FROM clause or else the system produces an error and does not attempt to execute the query. The query form for the federated platform is the same. The SQLQueryParserproduces an error and does not 26 process the DBPath query if it finds tables in other clauses that are not explicitly declared in the FROM clause. Since the queries conform to this standard format, the list of tables that follow the FROM clause are first used to enumerate all of the tables to be accessed for the federated query. For each table in the enumeration, the SQLQueryParserobject instantiates a new SQL TableQuery object designated to handle the query that will return all of the required results from the table specified. That is, each constructed SQL TableQuery object effectively stores a query that returns results that are all from one table-a table chosen from the enumeration. As each table in the enumeration is given a SQL TableQuery, the parser uses the DBPaths to correctly sort where each object goes. The parser object navigates through the DistributedQuerydata structure to correctly insert the SQL TableQuery object into its appropriate SQLMonoDBQuery. At the time of insertion, only the FROM clauses of the SQL TableQuery objects are set: the SELECT and WHERE clauses are next to be initialized. The parsing of the SELECT clause immediately follows parsing the FROM clause. The tables are again enumerated and processed individually until all table SELECT's are accounted for. During this process, the system checks to make sure that all tables accessed are declared in the FROM clause. As each SELECT column is processed, the SQLQueryParserobject extracts the DBPath and table name of each column to search through the DistributedQuerydata structure for the appropriate SQL TableQuery object. The parser searches for the specific SQL TableQuery object that is designated to return all of the results from that one table. Once the SQL TableQuery object is found, the column is inserted into the SELECT clause of the object (with the DBPath prefix removed). This assures that no columns are missed for the aggregate query at the end of the platform's execution of the federated query. Following the parsing of the SELECT clause comes the WHERE clause. The SQL vocabulary of the current implementation only allows multiple equality or inequality conditions. Sub-queries and table aliases cannot be used as of yet. In addition, the order of evaluation goes from left-to-right instead of the SQL's normal AND-thenOR order [39]. The logic of the parser as to how to decompose the WHERE conditions can be simplified into three different rules. The action that the SQLQueryParsertakes depends on the type of condition. The rules are as follows: 1) If the WHERE condition involves only one table, insert the condition into the WHERE clause of the SQL TableQuery object that handles the results of the column's table. Additionally, insert the specified column into the SELECT clause of the object. The inserted condition has its DBPath's stripped before it is inserted. [Example: condition HGDB->AccessObject.submitter = "Ben Fu" is stripped to AccessObject.name = "Ben Fu" then inserted into the AccessObject SQL TableQuery object in its WHERE as AccessObject.name = "Ben Fu" and SELECT as AccessObject.submitter] 2) If the WHERE condition involves two different tables both in a single database, insert the condition into each WHERE clause of both SQL TableQuery objects that 27 handle the results of the columns' tables. Additionally, insert the specified columns into the SELECT clauses of their respective objects. The FROM clause of each accessed SQL TableQuery must contain both table names. The inserted condition has its DBPath's stripped before it is inserted. [Example: condition is stripped to then inserted into the in both WHERE's as and both FROM clauses have HGDB4AccessObject.submitter = HGDB-*Contact.displayName AccessObject.submitter = Contact.displayName AccessObject SQL TableQuery AND Contact SQL TableQuery objects AccessObject.submitter = Contact.displayName insert into the the column AccessObject SQL TableQuery object AccessObject.submitter insert into the the column Contact SQL TableQuery object Contact.displayName] AccessObject, Contact 3) If the WHERE condition involves two different table on two different databases, do not insert the condition into any SQL TableQuery objects-the aggregate query will enforce this condition at the end. Insert the specified columns into the SELECT clauses of their respective objects. [Example: condition is stripped to HGDB4AccessObject.name = GATC-biologicalitem.item name AccessObject.submitter = biologicalitem.itemname insert into the the column AccessObject SQL TableQuery object AccessObject.submitter insert into the the column biological item SQL TableQuery object biological item.itemname] Because the DistributedQueryobject groups the SQL TableQuery objects by database in a list of SQLMonoDBQuery objects, the DBPath information is preserved in the object relationships. Query optimizations were investigated in [47] [48]. It was determined that many of these optimizations were beyond the scope of immediate design goals of the current system. However, for the sake of increasing speed and efficiency, optimizations should be kept in mind during future developments of the federation platform. Finally, when all of the conditions in the WHERE clause of the DBPath query are processed, the DistributedQueryobject then contains all of the SQL TableQuery objects, or decomposed end-database queries, needed to perform the federated query. The federation platform then moves on to begin querying the end-databases. DBDelegator Once federated query is decomposed and the DistributedQueryis populated with the individual queries, the federation platform instantiates a DBDelegatorobject. The DBDelegatorhandles the connectivity and query execution of all databases in the 28 federation platform. In addition, the delegator object handles the connectivity, result insertion, and query execution of the local database. It does this by interfacing with handler objects specific to the each database. Each member in the database federation has its connectivity information encapsulated in the header of its ClassMap. To establish connectivity to each database, the DBDelegatorbegins by instantiating a specific handler object for each database. Each handler is created based on the parameters specified by its ClassMap. Similarly, a handler for the local database is instantiated except its parameters are hardcoded into the system and not read in from ClassMaps. These handlers are designated to be the connectivity interface between the database and the DBDelegator. For the current implementation of the system, an InformixJDBCHandlerclass was created. The reasons why JDBC was used for connectivity are mentioned in the previous section. The handler class was tailored to connect to Informix databases because all of the database systems used in the current implementation were Informix databases. However, the only portions of the Inform ixJDBCHandlerclass that are specific to Informix are the connectivity parameters-the rest of the class is generalizable to any database that can handle JDBC calls. This handler object will be discussed in detail following the explanation of the DBDelegator. Once the handlers are initialized for the local database and the databases in the federation (remote databases), the DBDelegatorbegins processing the end-database queries. The delegator object starts the process by first requesting an enumeration of all registered databases from the DistributedQueryobject. The corresponding SQLMonoDBQuery object for the first database is looked up from the DistributedQuery. Once the object is found, the DBDelegatorthen uses the database's handler to begin querying the end-database. Each SQL TableQuery object in the SQLMonoDBQuery is converted into a SQL query string and sent to the handler. After the query is executed, the handler returns ajava.sql.ResultSetobject that encapsulates all of the data and metadata of the returned results. For each result set returned by the handler, the delegator passes the ResultSet to the local database's handler. The insertion is performed by methods in the handler that create a new table and place the data into it. The name of the new table is in the form [Database]_[TableName]. For example, results taken from the AccessObject table in the HGDB database would cause the local handler to create a new table called HGDBAccessObject. This convention to naming tables in this way assures that no newly created table names conflict while making it easy to identify where the data came from. This process is repeated for all of the remote databases. Finally, when the process of executing all of the decomposed queries is finished, the DBDelegatorobject transforms the DBPath query into an "aggregate query". That is, it converts the DBPath query into a query that is used against the local database when all of the remote data is finally aggregated. The query is equivalent to the original federated query with the only change being that all table names are prefixed with the database name and an underscore (as mentioned in the table naming scheme above). This is because the tables on the local database now contain the remote information under a new table name that complies with the naming scheme. See figure below for example transaction by the DBDelegator. 29 DBDelegator java. sql. ResultSet (object) S QLTabl eQue ry. to SQLSt ri n go Aggregate Query SELECT chipdesign.id, chipdesign.name FROM chip design WHERE chipdesign.id ='3751 73' SELECT GATC chip design.name FROM GATC chipdesign WHERE GATC chip designid ='3751 73' java.sql.Result SetMet aDat a (extracted from ResultSet) final java. sql. ResultSet (object) "CREATE TABLE GATC chipdesign name VARCHAR(32) );' SQL JDBC Handler Informix JDBC Handler SQL JDB( Handle r norm I JDB( Handil r GATC database Local Database :LE Figure 5. DBDelegatorobject handling a simple query. The results from the final aggregate query are sent back to the user and the tables in the local database are dropped. The connections to the database are dropped and this marks the end of the federated query execution. Figure 5 above demonstrates how the DBDelegatorobject handles a simple query. JDBCHandler The JDBCHandleris an object that the federation platform uses to connect to end-databases. The handler used in the current implementation, called the Inform ixJDBCHandler,was made to handle Informix databases. And as mentioned before, the main methods in the class are generalizable since JDBC is used. The only portions specific to Informix are some members that Informix requires for connectivity. The main function of the JDBCHandlerobject was to modulate connectivity into a simple object that could be instantiated. With such a handler, retrieving and copying result sets were simplified into single method calls. Plus, once the handler was instantiated and connected to the database, the federation platform could always access the database without having to create a new connection for each use. To retrieve an object that contains the information returned from a query, or a java.sql.ResultSet object, only the SQL statement string has to be passed into the getResultSeto method of the JDBCHandlerobject. The JDBCHandlerobject handles all of the database authentication and connectivity. To insert a result set into the local database, only the ResultSet object from another database needs to be passed into the 30 insertResultSet() method of the handler. The JDBCHandlerthen looks through the metadata of the ResultSet object and generates a SQL statement to create a table with same-typed and same-named columns on the local database. From there, each row is copied one field at a time until the entire contents of the original ResultSet object are transferred. This is all performed transparently to the owner of the JDBCHandler-all that needs to be known is that a copy of the ResultSet now resides on the local database. Through the use of the JDBCHandlerobject, access to tables within the federation platform is greatly simplified. This interface reduces the transactions to a few simple methods that separate the owner of the handler object from the database connectivity. This greatly reduces the clutter in code as well as the number of errors that could result from improper JDBC operations. All of the system components come together to form the federation platform. The figure below is a diagram of its architecture. FederationPlatform Client ClassMap 0 SQLQueryParser QueryDecomposer ClassMapRepository DBDelegator --- - -- - -r'~ S:)L JDBC Handler Informix JDBC Handler ----.----. - -- - - SQOL - - - - - SOL SL NonJDBC JDBC JDBC JDBC Handler Handler Handler: Handler Informix JDBC Handler L------- NonInformix JDBC H andler (future feature) Informix JDBC Handler --- ---- Internet ClassMa per I 0 GATC ass apper HGDB ClassMapper Database X Figure 6. The federation platform architecture. 31 | ClassMapper | DMne Y Local 0 VI. Implementation Sun E450 running Solaris 7 System: Informix Dynamic Server 2000 Database: JDBC driver: Informix (Type 4) JDBC driver version 2.20 Java version: Javam 2 SDK, Standard Edition, v 1.3 Network Interface: 10OBT Ethernet ClassMaps The ClassMap files that were used in this in the federation platform were the SQL ClassMaps from William Chuang's work. As mentioned before, the ClassMaps were extended to contain connectivity information for the databases. This connectivity information was stored as the header of the ClassMap file. The rest of the file was the SQL schema of the database. The federation platform parsed this information to extract table names from the database. VII. Discussion Bugs The current implementation of the system was tested for single and cross-realm database queries. For most queries tested, the architecture was able to process the query from start to finish. However, during the testing trials of the application, several bugs appeared. The bugs that could not be fixed are as follows: Malformed ClassMapfiles When a ClassMap file was malformed, the federation platform had a difficult time parsing it to determine the tables in a database. Small amounts of error checking were incorporated to handle this, but during some executions, the parser could not catch some malformed ClassMaps. StringTokenizer Bug The java.lang.StringTokenizerclass had a strange behavior when it was used to parse ClassMap files. If a "CREATE TABLE tablename (...)" SQL statement ended with a carriage return immediately after the table name and the delimiter used was the "space" character, the table name would not be tokenized properly. If a space was inserted before the carriage return, or no carriage return was used, then the tokenizer would function properly. When the tokens were printed to the screen, the table names output the same whether a carriage return followed or not. However, during an String equality test, the two failed. This bug must be documented in Sun's Java bug report. Large Data Sets Since JDBC is known to perform poorly with very large data sets, fetching large sets of data from end-databases should be avoided. In the future, a solution can be implemented to handle very large data sets. It is reported in Javasoft's bug reports [50] 32 that the sizes of the objects exceeding several megabytes have often times produced strange behaviors. As JDBC matures a bit more, this should be less of a problem. Future implementations of the system should avoid the use of large JDBC objects. Dropping Tables After the results are returned to the user, the federation platform attempts to drop the tables created in the local database. Currently, the database returns an error that states that it cannot drop the tables. This is likely due to some locks being held because of write operations executed on the tables. The DBDelegator.javaclass should be investigated to make sure that all locks on tables are released before the system requests for the tables to be dropped. Future Improvements Threading capabilities To increase the performance of the system, threading could be implemented. Specifically, the federation platform could spawn off processes to fetch data from enddatabases instead of fetching the data sequentially. This could require a significant amount of work to manage the threads, but the overall system performance would increase dramatically since the bottleneck in the system is the speed by which tables are fetched from end-databases. Security The current implementation of the system is lacking a solid security model. The authentication information for the end-databases could be readable by the outside world. Developing a more sophisticated authentication procedure could improve the security of the system in the future. Query Optimization There are many documents and papers with different strategies about query optimizations [48] [49]. The current implementation of the federation platform has limited amounts of optimizations. However, to fine-tune the system so that it fetches only the requiredtable information from the end-databases, query optimizations must be implemented into the system. This topic can get extremely complicated since there is a large math component involved in optimizing and handling data sets. Query optimizations should be incorporated in future implementations of the system to help increase the overall performance of the system. Deploying the Federation Platform The Informix JDBC driver must be included in the classpath during the runtime of the federation platform. To start the platform, the Java VM needs to run the FederationPlatform.class file. See the Appendix section for the code files. 33 Bibliography [1] N. Dao, P.J. McCormick, C.F. Dewey, Jr. The human physiome as an information environment. Annals of Biomedical Engineering. 2000. [2] P.J. McCormick. Designing Object-OrientedInterfacesfor Medical Data Repositories. M. Eng. thesis, MIT. 1999. [3] J. Crimson, W. Crimson, D. Berry, G. Stephens, E. Felton, D. Kalra, P. Toussaint, and OW. Weier. A CORBA-based Integration of Distributed Electronic Healthcare Records Using the Synapse Approach. IEEE Transactionson Information Technology in Biomedicine. September 1998, 124-138. [4] M. Hakman and T. Groth. Object-Oriented Biomedical System Modeling - The Rationale. Computer Methods and Programsin Biomedicine Vol 59. 1999. pp 1-17. [5] CC Talbot Jr., AJ Cuticchia, Human Mapping Databases. Current Protocols in Human Genetics 1.13.1-1.13.12. John Wiley & Sons, Inc. http://gdbwww.gdb.org/. 1999. [6] Human Genome Project. Report of the InvitationalDOE Workshop on Genome Informatics. http://www.ornl.gov/hgmis/publicat/miscpubs/bioinfo/inLrep2.html. April 1993. [7] National Center for Biotechnology Information. GenBank Overview. http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html. 1999. [8] John Macauley, Huajun Wang, and Nathan Goodman. A model systemfor studying the integration of molecular biology databases. Biolnformatics Journal. Vol 14, No. 71 pp 575-582.1998. [9] Nathan Goodman, Steve Rozen, and Lincoln Stein. The Casefor Componentry in Genome Information Systems. Meeting on Interconnection of Molecular Biology Databases, Stanford University, 1994. [10] Tatiana A. Tatusova, Ilene Karsch-Mizrachi, and James A. Ostell. Complete genomes in WWW Entrez: data representationand analysis Biolnformatics Journal. Vol 15, No. 7/8, pp 536-543. 1999. [11] Jens Hanke, Gerrit Lehmann, Peer Bork, and Jens G. Reich. Associate databaseof protein sequences, Biolnformatics Journal. Vol 15, No. 9, pp 741-748. 1999. [12] E. Barillot, U. Leser, P. Lijnzaad, C. Cussat-Blanc, K. Jungfer, F. Guyon, G. Vaysseix, C. Helgesen, and P. Rodriguez-Tom'e. A Proposalfora Standard CORBA Interfacefor Genome Maps, Biolnformatics Journal. Vol 15, No. 2, pp 157-169. 1999. [13] Oak Ridge National Laboratory. A DistributedConsortiumfor High-Throughput Analysis and Annotation of Genomes. http://grail.Isd.ornl.gov/gac/. 1999. [14] Human Genome Project. Genome Glossary. http://www.ornl.gov/hgmis/publicat/glossary.html. July, 1999. [15] Bruce Birren, Eric Green, Phil Hieter, Sue Klapholz, and Rick Myers, eds. Genome Analysis: A LaboratoryManual. Laboratory Press, 1996. [16] Stanford Human Genome Center. Human and Saccharomyces Genome Glossaries. http://www-shgc.stanford.edu/About/faq/glossary.html, http://genome-www.stanford.edu/Saccharomyces/help/glossary.html. 34 [17] LaTanya Sweeney. Towards the Optimal Suppression of Details when DisclosingMedical Data.... Proceedings of MEDINFO 98, International Medical Informatics Association. Seoul, Korea. North-Holland. p. 1157, 1998. [18] LaTanya Sweeney. Datafly: A System for ProvidingAnonymity in Medical Data, Database Security, XI: Status and Prospects (T. Lin and S. Qian, eds.), Elsevier Science, Amsterdam. Chapter 22, 1998. [19] LaTanya Sweeney. Replacing Personally-IdentifyingInformation in Medical Records, the Scrub System, Proceedings, Journal of the American Medical Informatics Association (JJ. Cinino, ed), Washington, DC: Hanley Belfus, Inc., pp. 333-337, 1996. [20] http://cbil.huingen.upenn.edu/epodb/ [21] W. Chuang. Design of a Genetics DatabaseforMedical Research. M. Eng. thesis, MIT. 2000. [22] S. K. Moore. HarmonizingData, Setting Standards. IEEE Spectrum vol. 38, issue 1, pp. 111-112. January 2001. [23] J. Kohler, M. Lange, R. Hofestadt, S. Schulze-Kremer. Logical and Semantic DatabaseIntegration. Bio-Informatics and Biomedical Engineering pp. 77-80, 2000. Proceedings of IEEE International Symposium on Nov. 8-10, 2000. [24] A. Freier, R. Hofestadt, M. Lange, and U. Scholz. MARGBench - An Approachfor Integration, Modeling and Animation of Metabolic Networks. Proceedings of the German Conference on Bioinformatics, Hannover, 1999. [25] P. D. Karp. A Strategyfor DatabaseInteroperation. Journal of Computational Biology, vol. 2, pp. 573-586, 1995. [26] B. Reinwald, H. Pirahesh, G. Krishnamoorthy, G. Lapis, B. Tran, S. Vora. Heterogeneous query processing through SQL tablefunctions. Proceedings of the 15th International Conference on Data Engineering, pp. 366-373, 1999. [27] G.J.L. Kemp, N. Angelopoulos, P.M.D. Gray. A schema-based approach to building a bioinformaticsdatabasefederation. Proceedings of the IEEE International Symposium on BioInformatics and Biomedical Engineering, 2000. [28] R. J. Robbins. Bioinformatics: Essentialinfrastructurefor global biology. Journal of Computational Biology, issue 3, pp. 465-478, 1996. [29] M. Garcia-Solaco, M. Castellanos, F. Saltor. Discovering interdatabaseresemblance of classesfor interoperabledatabases. Research Issues in Data Engineering, 1993: Interoperability in Multidatabase Systems, 1993. [30] A. Dogac, C. Dengi, E. Kilic, G. Ozhan, F. Ozcan, S. Nural, C. Evrendilek, U. Halici, B. Arpinar, P. Koksal, S. Mancuhan. A multidatabasesystem implementation on CORBA. Research Issues in Data Engineering, pp. 2-11, 1996. [31] K. Hergula, T. Harder. A middleware approachfor combining heterogeneous data sources integration of generic query andpredefinedfunction access. Web Information Systems Engineering, pp. 26 -33, vol.1, 2000. 35 [32] D.D. Karunaratna, W.A. Gray, N.J. Fiddian. Exploitation of database meta-data in assisting databaseinteroperation. IEE Colloquium on Multimedia Databases and MPEG-7 (Ref. No. 1999/056), pp. 12/1 -12/5, 1999. [33] H. Huang, J. Kerridge, S. Chen. A query mediation approach to interoperabilityof heterogeneous databases. Proceedings from the 11th Australasian Database Conference, pp. 41-48, 2000. [34] W. Meng and C. Yu. Query Processingin MultidatabaseSystems. Modem Database Systems: The Object Model, Interoperability, and Beyond, pp. 551-572. Edison Wesley, 1995. [35] A. d'Ambrogio, G. Izaeolla. A CORBA-based approach to design gateways for multidatabase systems. Enabling Technologies: Infrastructure for Collaborative Enterprises, pp. 49-54, 1997. [36] 0. Jautzy. Interoperabledatabases: a programminglanguage approach. Proceedings from IDEAS '99 International Symposium on Database Engineering and Applications, pp. 63-71, 1999. [37] S.B. Yoo, K.C. Kim, S.K. Cha. A middleware implementation of active rulesfor ODBMS. Proceedings from the 6th International Conference on Database Systems for Advanced Applications, pp. 347-354, 1999. [38] A.R. Hurson, M.W. Bright, S.H. Pakzad. MultidatabaseSystems: An Advanced Solution for Global Information Sharing. IEEE Computer Society Press, 1994. [39] B. Forta. Sams Teach YourselfSQL in 10 Minutes. Sams Publishing, 2000. [40] A. Elmagarmid, M. Rusinkiewicz, A. Sheth. Management of Heterogeneous and Autonomous DatabaseSystems. Morgan Kaufmann Publishers Inc., 1999. [41] http://sdmc.krdl.org.sg:8080/kleisli/ [42] http://www.genetic-exchange.con/ [43] http://srs6.ebi.ac.uk/ [44] http://edradour.cs.uni-ma gdeburg.de/iti brm/marg/ [45] http://www.affymetrix.com/ [46] C.S. Horstmann, G. Cornell. Core Java 2, Volume Microsystems Press, 2000. 1: Fundamentals. Prentice Hall PTR/Sun [47] J. Grant, J. Gryz, J. Minker, L. Raschid. Logic-basedquery optimizationfor object databases. IEEE Transactions on Knowledge and Data Engineering, vol. 12, issue 4, pp. 529-547, 2000. [48] J. Claussen, A. Kemper, G. Moerkotte, K. Peithner, M. Steinbrunn. Optimization and evaluation of disjunctive queries. IEEE Transactions on Knowledge and Data Engineering, vol. 12, issue 2, pp. 238-260, 2000. [50] http://www.javasoft.com/ 36 APPENDIX FederationPlatform.j ava import java.sql.*; import java.util.*; public class FederationPlatform private Vector registeredDatabaseAdapters; private QueryDecomposer queryDecomposer; public FederationPlatform() registeredDatabaseAdapters = new Vector(); queryDecomposer = new QueryDecomposer() public static void main(String args[l) FederationPlatform federationPlatform = new FederationPlatform() federationPlatform.initialize(args); public void initialize(String args[]) ClassMapRepository classMapRepository classMapRepository.processClassMapFile classMapRepository.processClassMapFile queryDecomposer. set ClassMapRepository = new ClassMapRepository() (args [0]) (args [1]); (classMapRepository) acceptQuery("PUT QUERY STRING HERE"); DBDelegator dbDelegator = new DBDelegator (queryDecomposer.getDistributedQuery()); dbDelegator.dropLocalTables(); public void acceptQuery(String queryString) queryDecomposer. processQuery (queryString); 37 ClassMapRepository.java import java.util.*; import java.io.*; public class ClassMapRepository Hashtable federatedClassMapHashtable; Hashtable classMapHashtable; public ClassMapRepository() federatedClassMapHashtable = new Hashtable() classMapHashtable = new Hashtable(); public Object putClassMap(String dbName, ClassMap classMap) //returns the previous value of the specified key in this did not have one. return classMapHashtable.put(dbName, classMap); public Object getClassMap(String hashtable, it dbName) //returns the value to which the key is mapped in this hashtable; not mapped to any value in this hashtable return classMapHashtable.get(dbName); public or null if Object removeClassMap(String null if the key is dbName) //returns the value to which the key had been mapped in this the key did not have a mapping. return classMapHashtable. remove (dbName) hashtable, or null if public Enumeration getDBEnumeration) return classMapHashtable.keys(; public Object putTableDB(String TableAlreadyExistsException!!! tableName, //returns the previous value of the did not have one. String dbName) specified key in this Object object = federatedClassMapHashtable.put(tableName, if (object != null) //throws hashtable, or null if it dbName); //throw TableAlreadyExistsException!!! return object; public Object getDBPath(String tableName) //throws TableDoesNotExistExeception!!! //returns the value to which the key is mapped in this hashtable; null if the key is not mapped to any value in this hashtable return federatedClassMapHashtable.get(tableName); public Object removeTableDB(String tableName) //throws TableDoesNotExistExeception!! //returns the value to which the key had been mapped in this the key did not have a mapping. return federatedClassMapHashtable. remove (tableName) 38 hashtable, or null if public Enumeration getTablesEnumeration() return federatedClassMapHashtable. keys(); private boolean parseClassMap (String classMapString) StringTokenizer stringLineTokenizer //tokenize go by lines try = new StringTokenizer(classMapString, ClassMap classMap = new ClassMap(; while (stringLineTokenizer.hasMoreTokens() && "\n"); (classMap.isAllParametersSet() true)) String currentLine = stringLineTokenizer.nextToken(); StringTokenizer stringWordTokenizer = new StringTokenizer(currentLine, //tokenize by spaces String currentWord; while (stringWordTokenizer.hasMoreElements() " currentWord = stringWordTokenizer.nextToken(); if ("DATABASEALIAS:".equalsIgnoreCase(currentWord)) currentWord = stringWordTokenizer.nextToken(); classMap.setDatabaseAlias(currentWord); else if ("CONNECTIVITY:" equalsIgnoreCase (currentWord)) currentWord = stringWordTokenizer.nextToken(); classMap.setConnectivity(currentWord); else if ("DATABASEIP: " equalsIgnoreCase (currentWord)) currentWord = stringWordTokenizer.nextToken(); classMap.setIP(currentWord); else if ("PORT: "equalsIgnoreCase (currentWord)) currentWord = stringWordTokenizer.nextToken(); classMap.setPort(currentWord); else if ("DATABASENAME:".equalsIgnoreCase(currentWord)) currentWord = stringWordTokenizer.nextToken(; classMap.setDatabaseName(currentWord); else if ("ADDITIONAL_PARAMETERS:". equalsIgnoreCase (currentWord)) currentWord = stringWordTokenizer.nextToken(; classMap.setAdditionalParameters(currentWord); else if ("AUTHENTICATION(user):".equalsIgnoreCase(currentWord)) currentWord = stringWordTokenizer.nextToken(); classMap.setUser(currentWord); else if ("AUTHENTICATION(password):" .equalsIgnoreCase (currentWord)) currentWord = stringWordTokenizer.nextToken(); classMap.setPassword(currentWord); putClassMap(classMap.getDatabaseAlias(, classMap); while (stringLineTokenizer.hasMoreTokens() String currentLine = stringLineTokenizer.nextToken(); StringTokenizer stringWordTokenizer = new StringTokenizer(currentLine, //tokenize by spaces String currentWord; 39 " ") != while (stringWordTokenizer.hasMoreElements() currentWord = stringWordTokenizer.nextToken(); if ("CREATE". equalsIgnoreCase (currentWord)) currentWord = stringWordTokenizer.nextToken(); if ("TABLE". equalsIgnoreCase (currentWord)) currentWord = stringWordTokenizer.nextToken(); //this should be the tableName table name currentWord.trim(); //there's a strange bug... make sure there's at least a space after the in the ClassMap file //or else the tableString won't be matched up in the hashtable during lookups putTableDB (currentWord, classMap.getDatabaseAlias()); //<--DatabaseALIAS not DatabaseNAME catch (NoSuchElementException e) System.out.println("ERROR: return false; NoSuchElementException:"+ e.getMessage(); return true; public boolean processClassMap(String classMapString) return parseClassMap(classMapString) public boolean processClassMapFile (String fileName) try StringBuffer sb = new StringBuffer(); FileReader fileReader = new FileReader(fileName) int currentread = fileReader.read(); while (currentread != -1) sb.append((char)currentread); currentread = fileReader.read(; return parseClassMap(sb.toString(); catch (FileNotFoundException e) System.out.println("ERROR: catch File Not Found Exception:" + e.getMessage()); (IOException e) System.out.println("ERROR: return false; IOException:" //did not complete 40 + e.getMessageo) ClassMap.java public class ClassMap private private private private private private private private String String String String String String String String databaseAlias; databaseName; connectivity; IP; port; additionalParameters; user; password; public ClassMap() public void setDatabaseAlias(String databaseAlias) this.databaseAlias = databaseAlias; public String getDatabaseAlias() return databaseAlias; public void setDatabaseName(String databaseName) this.databaseName = databaseName; public String getDatabaseName() return databaseName; public void setConnectivity(String connectivity) this.connectivity = connectivity; public String getConnectivity() return connectivity; public void setIP(String IP) this.IP = IP; public String getIP() return IP; public void setPort(String port) this.port = port; public String getPort() return port; public void setAdditionalParameters(String additionalParameters) this.additionalParameters = additionalParameters; 41 public String getAdditionalParameters() return additionalParameters; public void setUser(String user) this.user = user; public String getuser) return user; public void setPassword(String password) this.password = password; public String getPassword() return password; public boolean isAllParametersSet() if ((databaseAlias != && (IP null) null) && && (port (databaseName null) && (connectivity != null) && (user != null) && (password return true; else return false; public String toString() StringBuffer sb = new StringBuffer); sb.append("databaseAlias=")sb.append(databaseAlias); sb.append(" databaseName="); sb.append(databaseName); sb.append(" connectivity="); sb.append(connectivity); sb.append(" IP="); sb.append(IP); sb.append(" port="); sb.append(port); sb.append(" additionalParameters="); sb.append(additionalParameters); sb.append(" user="); sb.append(user); sb.append(" password="); sb.append(password); return sb.toStringo; 42 null) null)) DistributedQuery.java import java.util.*; public class DistributedQuery private private private private private private private private String query; String dbPathQuery; String aggregateQuery; Hashtable monoDBQueryHashtable; Vector aggregateSelectVector; Vector aggregateFromVector; Vector aggregateWhereVector; ClassMapRepository classMapRepository; public DistributedQuery() query = " monoDBQueryHashtable = new Hashtableo; aggregateSelectVector = new Vector(); aggregateFromVector = new Vector(); aggregateWherevector = new Vector(); public DistributedQuery (String queryString) query = queryString; monoDBQueryHashtable = new Hashtableo; aggregateSelectVector = new Vector(); aggregateFromVector = new Vector(); aggregateWhereVector = new Vector(); public void setClassMapRepository(ClassMapRepository classMapRepository) this.classMapRepository = classMapRepository; public ClassMapRepository getClassMapRepository() return classMapRepository; public void setQueryString(String queryString) query = queryString; public String getQueryString() return query; public void setDBPathQueryString (String dbPathQuery) this.dbPathQuery = dbPathQuery; public String getDBPathQueryString() return dbPathQuery; public void setAggregateQueryString (String aggregateQuery) this.aggregateQuery = aggregateQuery; public String getAggregateQueryString() return aggregateQuery; 43 public Object putMonoDBQuery (String dbName, SQLMonoDBQuery dbQuery) //returns the previous value of the specified key in this hashtable, or null if it did not have one. return monoDBQueryHashtable.put(dbName, dbQuery); public Object removeMonoDBQuery (String dbName) //returns the value to which the key had been mapped in this the key did not have a mapping return monoDBQueryHashtable. remove (dbName) public Object getMonoDBQuery(String dbName) return monoDBQueryHashtable. get (dbName); public Enumeration getMonoDBKeys() return monoDBQueryHashtable.keys(; public void addSelect (String selectString) aggregateSelectVector.addElement (selectString) public void removeSelect (String selectString) //String must be exact same reference to be removed aggregateSelectVector. removeElement (selectString) public Enumeration getSelectEnumeration() return aggregateSelectVector.elements() public void addFrom(String fromString) aggregateFromVector. addElement (fromString) public void removeFrom(String fromString) //String must be exact same reference to be removed aggregateFromVector. removeElement (fromString) public Enumeration getFromEnumeration() return aggregateFromVector.elements(); public void addWhere(String whereString) aggregateWhereVector. addElement (whereString); public void removeWhere (String whereString) //String must be exact same reference to be removed aggregateWhereVector. removeElement (whereString) public Enumeration getWhereEnumeration() 44 hashtable, or null if return aggregateWhereVector. elements(); public String toString() StringBuffer sb = new StringBuffer(); sb.append(" -=DistributedQuery=- query="+query+"\n"); sb append("dbPathQuery="+dbPathQuery+"\n"); sb. append(" aggregateQuery="+aggregateQuery+"\n"); sb.append("---------------------------------\n"); Enumeration dbKeys = getMonoDBKeys(; while (dbKeys.hasMoreElements()) String monoDBQueryString = (String) (dbKeys.nextElement()); SQLMonoDBQuery monoDBQuery = (SQLMonoDBQuery) getMonoDBQuery (monoDBQueryString); sb.append(monoDBQuery.toString()); return sb.toString(); 45 SQLMonoDBQuery.java import java.util.*; public class SQLMonoDBQuery //This data object holds all queries for one database private String databaseNameString; private Hashtable tableQueryHashtable; public SQLMonoDBQuery() //tableQueryVector = new Vector(); tableQueryHashtable = new Hashtable(; public void setDatabaseName (String databaseNameString) this.databaseNameString = databaseNameString; public String getDatabaseName() return databaseNameString; public Object putTableQuery(String tableName, SQLTableQuery //returns the previous value of the specified key in this did not have one. return tableQueryHashtable.put(tableName, tableQuery); public Object removeTableQuery(String getTableQuery(String hashtable, or null if it tableName) //returns the value to which the key had been mapped the key did not have a mapping return tableQueryHashtable. remove (tableName) public Object tableQuery) in this hashtable, or null if tableName) return tableQueryHashtable.get(tableName); public Enumeration getTableKeys() return tableQueryHashtable.keys(; public String toString() StringBuffer sb = new StringBuffer(); Enumeration tableKeys = getTableKeyso; sb.append("-=SQLMonoDBQuery=- databaseName="+databaseNameString+"\n"); sb.append("---------------------------------------------------\n") while (tableKeys.hasMoreElements() String tableKeyString = (String) tableKeys.nextElement(; SQLTableQuery tableQuery = (SQLTableQuery) getTableQuery (tableKeyString); sb.append(tableQuery.toString() + "\n\n") return sb.toString(); 46 SQLTableQuery.java import java.util.*; public class SQLTableQuery //This class is a data object that contains one query to a database private String tableNameString; private String DBNameString; private Vector selectVector; private Vector fromVector; private Vector whereVector; public SQLTableQuery() selectVector = new Vector(; fromVector = new Vector(); whereVector = new Vector(); public void setTableName (String tableNameString) this.tableNameString = tableNameString; public String getTableName() return tableNameString; public void setDBName(String dbName) this.DBNameString = dbName; public String getDBName() return DBNameString; public int findElementIndex(Vector vector, String findString) //returns the index of the element in the Vector that is equivalent to the findString //CASE SENSITIVE!!! //if no index is found, returns -1 //if finds the same case, returns -2 boolean foundwrongcase = false; for (int i = 0; i < vector.size(); i++) if ( ((String) (vector.elementAt(i)) .equals(findString) return i; else if ( ((String) (vector.elementAt(i)) foundwrong case if = true; .equalsIgnoreCase(findString) //exists, but in wrong case (foundwrong case == true) return -2; //exists, but in wrong case else return -1; //not in vector public int findElementIndexIgnoreCase(Vector 47 vector, String findString) //returns the index of the element in the Vector that is equivalent to the findString //if no index is found, returns -l for (int i = 0; i < vector.size(); i++) if ( ((String) (vector.elementAt(i))) .equalsIgnoreCase(findString) return i; return -1; //not in vector public boolean addSelect (String selectString) int elementindex = findElementIndexIgnoreCase(selectVector, selectString); if (elementindex -1) //if the string isn't in the vector selectVector.addElement (selectString) return true; return false; public boolean removeSelect (String selectString) int elementindex if (elementindex = != findElementIndexIgnoreCase(selectVector, selectString); -1) selectVector.remove(elementindex); return true; return false; public Enumeration getSelectEnumeration() return selectVector.elements(); public boolean addFrom(String fromString) int elementindex = findElementIndexIgnoreCase(fromVector, fromString); if (elementindex == -1) //if the string isn't in the vector fromVector.addElement(fromString); return true; return false; public boolean removeFrom(String fromString) int element index = findElementIndexIgnoreCase(fromVector, fromString); if (elementindex != -1) fromVector.remove(elementindex); return true; return false; public Enumeration getFromEnumeration() return fromVector.elements(); public boolean addWhere(String whereString) int elementindex = findElementIndex(whereVector, whereString); if (element-index == -1) //if the string isn't in the vector 48 whereVector.addElement(whereString); return true; return false; public boolean removeWhere(String whereString) int element index = findElementIndex(whereVector, whereString); if ( (elementindex != -1) && (element-index -2) whereVector.remove(elementindex); return true; return false; public Enumeration getWhereEnumeration() return whereVector.elements); public String toSQLString() StringBuffer sb = new StringBuffer(); sb.append("SELECT "); for (int i=0; i < selectVector.size(); sb.append( if (i < i++) (String) selectVector.elementAt (i) (selectVector.size() - 1)) sb.append(", "); else sb.append("\n"); sb.append("FROM "); for (int i=0; i < fromVector.size() ; i++) sb.append( (String) fromVector.elementAt(i) if (i < (fromVector.size() - 1)) sb.append(", "'); else sb.append("\n"); if (whereVector.size() 0) sb.append("WHERE "); for (int i=0; i < whereVector.size(); sb.append( if (i < i++) (String)whereVector.elementAt (i) (whereVector.size() - 1)) sb.append(", "'); else sb.append("\n"); return sb.toString(); public String toString() 49 StringBuffer sb = new StringBuffer(); sb.append("-=SQLTableQuery=- \n"); sb.append(" DBName="+DBNameString+"\n"); sb. append(" TableName=" +tableNameString+" \n" ) sb.append ("SELECT "); for (int i=O; i < selectVector.size(); i++) sb.append( (String) selectVector.elementAt (i) if (i < (selectVector.size() - 1)) sb.append(", "); else sb.append(I"\n"); sb.append("FROM "); for (int i=0; i < fromVector.size(); i++) sb.append( (String)fromVector.elementAt(i) if (i < (fromVector.size() - 1)) sb.append(", "); else sb.append("\n"); if (whereVector.size() != 0) sb.append("WHERE "); i=0; i < whereVector.size() for (int i++) sb.append( (String)whereVector.elementAt (i) if (i < (whereVector.size() - 1)) sb.append(", "); else sb.append("\n"); return sb.toStringo); 50 QueryDecomposer.java import java.util.*; public class QueryDecomposer private private private private Vector monoQueryVector; SQLQueryParser queryParser; DistributedQuery distributedQuery; ClassMapRepository classMapRepository; public QueryDecomposer() monoQueryvector = new Vector(); queryParser = new SQLQueryParser(); public Vector getMonoDatabaseQueryVector() return monoQueryVector; //return a clone of this? public DistributedQuery getDistributedQuery() return distributedQuery; public ClassMapRepository getClassMapRepository() return classMapRepository; public void setClassMapRepository(ClassMapRepository classMapRepository) this.classMapRepository = classMapRepository; if (distributedQuery != null) distributedQuery. setClassMapRepository(classMapRepository) public boolean processQuery (String distributedQueryString) distributedQuery = new DistributedQuery (distributedQueryString) distributedQuery. setClassMapRepository (classMapRepository); distributedQuery. setDBPathQueryString (addDBPathsToQuery(distributedQueryString)); queryParser.parse(distributedQuery); return true; //needs to signal processing went thru private String addDBPathsToQuery (String queryString) String queryUpperCaseString = queryString.toUpperCase(); String selectString, fromString, whereString; String dbPathSelectString, dbPathFromString, dbPathWhereString; StringTokenizer stringTokenizer; StringBuffer sb; if (queryUpperCaseString.equals( "")) //***throw an exception! System.out.print("Empty query string!"); return int int //+7; selectindex = queryUpperCaseString.indexof("SELECT"); //+5; from index = queryUpperCaseString.indexOf("FROM"); 51 int whereindex if (selectindex == //no SELECT keyword -1) System.out.println("Malformed return else if //+6; = queryUpperCaseString.index0f("WHERE") (selectindex != Query! No SELECT keyword found."); //malformed query 0) System.out.println("Malformed return Query! SELECT keyword must be the first word."); else = queryString.substring(selectindex selectString if System.out.println("Malformed return else if (whereindex != fromString Query! No FROM keyword found."); //there is a WHERE keyword -1) = queryString.substring(fromindex + 5, where-index) + 5, queryString.length()); //no WHERE keyword else fromString if fromindex) //no FROM keyword -1) (from-index == + 7, = queryString.substring(from_index (whereindex == -1) whereString = else if ((whereindex + 6) == (queryUpperCaseString.length() - 1)) whereString = else whereString = if queryString.substring(where queryString.length(); index + 6, //tokenize strings //SELECT //for selectString, test if token is a key in hashtable, and add DBPath to all tokens in hashtable (identify by [TABLE] . [COLUMN]) stringTokenizer = new StringTokenizer(selectString, ", String currentSelect; sb = new StringBuffer(; while (stringTokenizer.hasMoreTokens() currentSelect = stringTokenizer.nextToken() currentSelect = currentSelect.trim(); if (getTablePath(currentSelect) .equalsIgnoreCase ("")) specified //if TablePath not //throw and EXCEPTION! -- malformed SELECT clause System. out.println("ERROR! malformed SELECT clause") sb.append(" [MALFORMED)"); else String tableName = getTablePath(currentSelect) String columnName = cropTablePath(currentSelect) String dbName = (String)classMapRepository.getDBPath(tableName) if (dbName == null) //table has no corresponding DB! 52 //throw an EXCEPTION! System.out.println("ERROR! table has no corresponding DB!: tableName="+tableName); sb.append(" [NOT IN CLASSMAPS)"); else sb.append(dbName if + ">"I + tableName + + columnName); (stringTokenizer.hasMoreTokens() sb.append(", "1); dbPathSelectString = sb.toString(); //FROM //for fromString, test if token is a keyin hashtable, add DBPath to all tokens if in hashtable (identify by [TABLE] only) stringTokenizer = new StringTokenizer(fromString, ","); String currentFrom; sb = new StringBuffer(); while (stringTokenizer.hasMoreTokens()) currentFrom = currentFrom = String dbName if (dbName == stringTokenizer.nextToken() currentFrom.trim(); = (String) classMapRepository.getDBPath(currentFrom) null) //table has no corresponding DB! //throw an EXCEPTION! System.out.println("ERROR! tableName="+currentFrom); sb.append(II[NOT table has no corresponding DB!: IN CLASSMAPS]"); else sb.append(dbName + "->" + currentFrom) if (stringTokenizer.hasMoreTokens() sb.append(", "); dbPathFromString = sb.toString(); //WHERE //for whereString, tokenize and find [TABLE] [COLUMN] .... hashtable, then add DBPath stringTokenizer = new StringTokenizer(whereString, String currentWhere; sb = new StringBuffer(; while (stringTokenizer.hasMoreTokens() currentWhere = stringTokenizer.nextToken() currentWhere = currentWhere.trim(); if (getTablePath(currentWhere) equalsIgnoreCase("") a TablePath == test if [TABLE] false) String tableName = getTablePath(currentWhere); String columnName = cropTablePath(currentWhere); String dbName = (String)classMapRepository.getDBPath(tableName) if (dbName == null) //table has no corresponding DB! //throw an EXCEPTION! System.out.println("ERROR! table has no corresponding DB!: tableName=" +tableName); sb.append(" [NOT IN CLASSMAPS]"); else 53 //if is a key in qualifies as sb.append(dbName + "> " + tableName + + columnName) else sb.append(currentWhere); if (stringTokenizer.hasMoreTokens() sb.append (" //to "); space out words dbPathWhereString = sb.toString(); sb = new StringBuffer(); sb.append('SELECT "); sb.append(dbPathSelectString); sb.append(" \n FROM "); sb.append(dbPathFromString); sb.append(" \n WHERE "); sb.append(dbPathWhereString); return sb.toString();//FINALSTRING!!! private String getTablePath(String pathString) int path index if = (path-index == return ""; //throws BadTableException? pathString.index0f("."); -1) //throw BadTableException else String DBPathString = pathString.substring(O, path-index); return DBPathString; private String cropTablePath(String pathString) int path index if = (path-index == return ""; //throws BadTableException? pathString.indexOf("."); -1) //throw BadTableException else String croppedDBString = pathString.substring(path pathString.length(); return croppedDBString; 54 index + 1, SQLQueryParser.java import java.util.*; public class SQLQueryParser private private private private private private String selectString; String fromString; String whereString; String queryString; String queryUpperCaseString; DistributedQuery distributedQuery; public SQLQueryParser() queryString = ""; public SQLQueryParser (String queryString) parse(queryString); public SQLQueryParser(DistributedQuery distributedQuery) this.distributedQuery = distributedQuery; parse(distributedQuery); public void parse (DistributedQuery distributedQuery) this.distributedQuery = distributedQuery; parse (distributedQuery. getDBPathQueryString() public void parse(String queryString) this.queryString if = queryString; (queryString == null) System.out.println("Null return; query string!"); this.queryUpperCaseString = queryString.toUpperCase(); extractClauses(); calculateRequiredTables(); public void extractClauses() //need to make private if (queryUpperCaseString.equals( "")) //***throw an exception! System.out.print("Empty query string!"); return; int selectindex = queryUpperCaseString.indexof("SELECT"); //+7; int from index = queryUpperCaseString.index0f("FROM"); //+5; int whereindex = queryUpperCaseString.indexof("WHERE") //+6; if (select-index == -1) //no SELECT keyword System.out.println("Malformed return; else if (selectindex != 0) Query! No SELECT keyword found."); //malformed query System.out.println("Malformed return; Query! SELECT keyword must be the first 55 word."); else selectString queryString.substring(select = if (fromindex == -1) fromindex) //no FROM keyword System.out.println("Malformed return; else if index + 7, (where-index != -1) Query! No FROM keyword found."); //there is a WHERE keyword fromString = queryString.substring(fromindex + 5, whereindex); else //no WHERE keyword fromString = queryString.substring(from-index if + 5, queryString.length(); (where-index == -1) whereString else if = ((whereindex whereString + 6) == (queryUpperCaseString.length() - 1)) = else whereString = queryString.substring(where-index queryString.length()); + 6, private String clausesToString() StringBuffer sb = new StringBuffer(; sb.append("SELECT="+selectString+"\n) sb.append("FROM="+fromString+"\n"); sb.append("WHERE="+whereString+"\n"); return sb.toString(); public void calculateRequiredTables() String currentSelect; String currentFrom; String currentWhere; StringTokenizer stringTokenizer; SQLTableQuery tempTableQuery; SQLMonoDBQuery tempMonoDBQuery; //Vector columnVector = new Vector(); Vector fromVector = new Vector(); Vector whereVector = new Vector(); //Tables for FROM--the FROM clause helps enumerate the tables that need to be accessed //-be sure to add aliases later stringTokenizer while = new StringTokenizer(fromString, (stringTokenizer.hasMoreTokens()) ","); //most basic case of from currentFrom = stringTokenizer.nextToken() currentFrom = currentFrom.trim(); if (getDBPath(currentFrom) .equalsIgnoreCase ("")) //SHOULD NEVER GET HERE.. .throw exception! else 56 //IMPLEMENT FOR MULTIPLE DBPath's String dbName = getDBPath(currentFrom) String tableName = cropDBPath(currentFrom) tempTableQuery = new SQLTableQuery(; tempTableQuery.setDBName(dbName); tempTableQuery. setTableName (tableName) tempTableQuery. addFrom (tableName); fromVector.add(tableName); if (distributedQuery.getMonoDBQuery(dbName) create new SQLMonoDBQuery object == null) //if doesn't exist, tempMonoDBQuery = new SQLMonoDBQuery(; tempMonoDBQuery. setDatabaseName (dbName); distributedQuery.putMonoDBQuery(dbName, tempMonoDBQuery); else tempMonoDBQuery = (SQLMonoDBQuery)distributedQuery.getMonoDBQuery(dbName); tempMonoDBQuery. putTableQuery (tableName, tempTableQuery); //Tables for SELECT stringTokenizer = new StringTokenizer(selectString, ","); while (stringTokenizer.hasMoreTokens()) //most basic case of selecting currentSelect = stringTokenizer.nextToken() currentSelect = currentSelect.trim(); if (getDBPath(currentSelect) .equalsIgnoreCase ("")) //if DBPath not specified //throw and EXCEPTION! -- DBPath's are required else //IMPLEMENT FOR DBPath if given a specific DBPath for the columns String dbName = getDBPath(currentSelect); String tableColumnName = cropDBPath(currentSelect) String tableName = getTablePath(tableColumnName); String columnName = cropTablePath(tableColumnName) insertSelectIntoTableQuery(dbName, tableName, columnName); } //Tables for WHERE if (whereString != "") stringTokenizer = new StringTokenizer(whereString, " StringBuffer sb = new StringBuffer(; String upperCaseWhereString = whereString.toUpperCase() //to find ANDs and ORs easier while (stringTokenizer.hasMoreTokens() currentWhere = stringTokenizer.nextToken(); currentWhere = currentWhere.trim(); //NEED TO CARVE OUT MORE LOGIC HERE if (currentWhere. equalsIgnoreCase ("AND") whereVector.add(sb.toString()); sb = new StringBuffer(); sb.append(currentWhere + " ") if (sb.toString() if !=)" (sb.charAt(sb.length() - 1) == ' 57 I currentWhere. equalsIgnoreCase ("OR")) sb.deleteCharAt(sb.length() - 1); whereVector.add(sb.toString()); else whereVector.add(sb.toString(); String currentCondition; String currentToken = Vector strippedWhereVector = new Vector(); for (int i = 0; i < whereVector.size(); i++) currentCondition = (String)whereVector.elementAt(i); stringTokenizer = new StringTokenizer(currentCondition, StringBuffer noDBPathWhere = new StringBuffer(); int firstindex = currentCondition.index0f("->"); int last-index = currentCondition.lastIndexOf("->"); String firstDB = String firstTable = String firstColumn = String secondDB = String secondTable = String secondColumn = while " (stringTokenizer.hasMoreTokens() currentToken = stringTokenizer.nextToken() if (firstDB.equals("") && (currentToken.indexOf("->") -1)) //no DB was found yet firstDB = getDBPath(currentToken); String firstTableColumn = cropDBPath(currentToken); firstTable = getTablePath(firstTableColumn); firstColumn = cropTablePath(firstTableColumn); noDBPathWhere.append(firstTableColumn else if ((firstDB.equals("") //first DB already found == false) + " && (currentToken.indexof("->") -1)) secondDB = getDBPath(currentToken); String secondTableColumn = cropDBPath(currentToken); secondTable = getTablePath(secondTableColumn); secondColumn = cropTablePath(secondTableColumn); noDBPathWhere.append(secondTableColumn + " else noDBPathWhere.append(currentToken + " strippedWhereVector.add(noDBPathWhere.toString()); if (firstDB.equals ("")) //no System. out.println("ERROR! else if (secondDB == "") DBPaths read! ERROR! MALFORMED WHERE"); //only one DB is used //SELECT: insert firstColumn into firstDB->firstTable SELECT insertSelectIntoTableQuery(firstDB, firstTable, firstColumn); //FROM: no changes //WHERE: insert noDBPathWhere into firstDB->firstTable WHERE insertWhereIntoTableQuery(firstDB, firstTable, noDBPathWhere.toString()); else if (firstDB.equalsIgnoreCase(secondDB)) twice in the where clause 58 //one DB is referenced but used if (firstTable.equalsIgnoreCase(secondTable)) //a join is performed with the same table (self-join) //SELECT: insert firstColumn into firstDB->firstTable, insert secondColumn into secondDB->secondTable insertSelectIntoTableQuery(firstDB, firstTable, firstColumn); insertSelectIntoTableQuery(secondDB, secondTable, secondColumn); //FROM: no changes //WHERE: insert noDBPathWhere into firstDB->firstTable (DO NOT insert into secondDB->secondTable) insertWhereIntoTableQuery(firstDB, firstTable, noDBPathWhere.toString()); else (natural join) //a join is performed on two different tables on the same DB //SELECT: insert firstColumn into firstDB->firstTable->SELECT, insert secondColumn into secondDB->secondTable SELECT insertSelectIntoTableQuery(firstDB, firstTable, firstColumn); insertSelectIntoTableQuery(secondDB, secondTable, secondColumn); //FROM: insert firstTable into secondDB->secondTable FROM, insert secondTable into firstDB->firstTable FROM insertFromIntoTableQuery(secondDB, secondTable, firstTable); insertFromIntoTableQuery(firstDB, firstTable, secondTable); //WHERE: insert noDBPathWhere into firstDB->firstTable WHERE, insert noDBPathWhere into secondDB->secondTable WHERE insertWhereIntoTableQuery(firstDB, firstTable, noDBPathWhere.toString()); insertWhereIntoTableQuery(secondDB, secondTable, noDBPathWhere.toString()); else if (firstDB.equalsIgnoreCase (secondDB) == false) //SELECT: insert firstColumn into firstDB->firstTable SELECT, secondColumn into secondDB->secondTable SELECT insertSelectIntoTableQuery (firstDB, firstTable, firstColumn); insertSelectIntoTableQuery(secondDB, secondTable, secondColumn); //FROM: no changes //WHERE: no changes -- DO NOT send to either private void insertWhereIntoTableQuery(String whereString) //throws Exceptions! DB dbName, (let aggregated query handle) String tableName, String SQLMonoDBQuery tempMonoDBQuery; SQLTableQuery tempTableQuery; if (distributedQuery.getMonoDBQuery(dbName) EXCEPTION! == null) //throw new DBNotFoundException!!! System.out.println("ERROR! DBNotFoundException //if doesn't exist, throw in SQLQueryParser.insertWhereIntoTableQuery("); else tempMonoDBQuery = (SQLMonoDBQuery)distributedQuery.getMonoDBQuery(dbName); if (tempMonoDBQuery.getTableQuery(tableName) == null) //throw new TableQueryNotFoundException!! System. out.println ("ERROR! TableQueryNotFoundException in SQLQueryParser. insertWhereIntoTableQuery()") else tempTableQuery = (SQLTableQuery)tempMonoDBQuery.getTableQuery(tableName); //make sure that the TableQuery WHERE clause does not begin with AND or OR Enumeration whereEnumeration = tempTableQuery. getWhereEnumeration(); if (whereEnumeration.hasMoreElements() //if there are already WHERE's 59 tempTableQuery. addWhere (whereString) else //strip AND or OR off whereString if it's there, then add the whereString StringBuffer sb = new StringBuffer); StringTokenizer stringTokenizer = new StringTokenizer(whereString, " String currentWhere; while (stringTokenizer.hasMoreTokens() currentWhere = stringTokenizer.nextToken(); if (currentWhere.equalsIgnoreCase("AND") I currentWhere.equalsIgnoreCase("OR")) //don't add currentWhere to stringBuffer else sb.append(currentWhere); if (stringTokenizer.hasMoreTokens() sb.append(""); tempTableQuery.addWhere(sb.toString()); private void insertSelectIntoTableQuery(String //throws Exceptions! columnName) SQLMonoDBQuery tempMonoDBQuery; SQLTableQuery tempTableQuery; if (distributedQuery.getMonoDBQuery(dbName) EXCEPTION! //add the stripped whereString dbName, == null) String tableName, String //if doesn't exist, throw //throw new DBNotFoundException!!! System. out.println ("ERROR! DBNotFoundException in SQLQueryParser. insertSelectIntoTableQuery()"); else tempMonoDBQuery = (SQLMonoDBQuery) distributedQuery. getMonoDBQuery (dbName); == null) if (tempMonoDBQuery.getTableQuery(tableName) //throw new TableQueryNotFoundException!! System. out.println ("ERROR! TableQueryNotFoundException SQLQueryParser. insertSelectIntoTableQuery)") in else tempTableQuery = (SQLTableQuery)tempMonoDBQuery.getTableQuery(tableName) + columnName); tempTableQuery.addSelect(tableName + "." private void insertFromIntoTableQuery(String dbName, String tableName, String //throws Exceptions! destinationTableQuery) SQLMonoDBQuery tempMonoDBQuery; SQLTableQuery tempTableQuery; if (distributedQuery.getMonoDBQuery(dbName) EXCEPTION! == null) //throw new DBNotFoundException!!! System. out.println ("ERROR! DBNotFoundException in SQLQueryParser.insertFromIntoTableQuery()"); 60 //if doesn't exist, throw else tempMonoDBQuery = (SQLMonoDBQuery)distributedQuery.getMonoDBQuery(dbName); if (tempMonoDBQuery.getTableQuery(destinationTableQuery) == null) //throw new TableQueryNotFoundException!! System. out. println ("ERROR! TableQueryNotFoundExcept ion in SQLQueryParser. insertFromIntoTableQuery()") else tempTableQuery = (SQLTableQuery)tempMonoDBQuery.getTableQuery(destinationTableQuery); tempTableQuery. addFrom (tableName); private String getDBPath(String pathString) int if //throws BadDBPathException? path index = pathString.indexOf ("->"); (path-index == -1) return ""; //throw BadDBPathException else String DBPathString = pathString.substring(0, return DBPathString; private String cropDBPath(String pathString) int if path-index); //throws BadDBPathException? path index = pathString.indexaf ("->"); (pathindex == -1) return ""; //throw BadDBPathException else String croppedDBString = pathString.substring(path pathString.length(); return croppedDBString; private String getTablePath(String pathString) int path index if = (pathindex == return ""; index + 2, //throws BadTableException? pathString.index0f("."); -1) //throw BadTableException else String DBPathString = pathString.substring(0, return DBPathString; private String cropTablePath(String pathString) int path index if = (pathindex == return ""; pathString.indexOf(" ."); -1) //throw BadTableException else 61 path-index); //throws BadTableException? String croppedDBString = pathString.substring(path pathString.length()); return croppedDBString; 62 index + 1, DBDelegator.java import java.util.*; import java.sql.*; public class DBDelegator private private private private ClassMapRepository classMapRepository; InformixJDBCHandler localDBHandler; Hashtable remoteDBHandlerHashtable; DistributedQuery distributedQuery; private Vector localTableNamesVector; public DBDelegator(DistributedQuery distributedQuery) remoteDBHandlerHashtable = new Hashtable(; localTableNamesVector = new Vector(); this.distributedQuery = distributedQuery; if (distributedQuery != null) this.classMapRepository = distributedQuery.getClassMapRepository(); public DBDelegator() remoteDBHandlerHashtable = new Hashtable(); localTableNamesVector = new Vector(); public void setClassMapRepository (ClassMapRepository classMapRepository) this.classMapRepository = classMapRepository; public ClassMapRepository getClassMapRepository() return classMapRepository; public void setDistributedQuery(DistributedQuery distributedQuery) this.distributedQuery = distributedQuery; public String getFinalResultString() setupLocalDBHandler(); setupRemoteDBHandlers(); processTableQueries(); ResultSet finalResultSet = getAggregateQueryResultSet(); return resultSetToString (f inalResultSet) public void setupLocalDBHandlero) //values need to be changed if a change is made to the configuration of the localDB localDBHandler = new InformixJDBCHandler(true); localDBHandler.setUrl("jdbc:informix- sqli://18.66.0.25:1013/bfuthesis:INFORMIXSERVER=ICMIT"); localDBHandler.setUser("informix"); localDBHandler. setPassword("AndrewMc"); public void setupRemoteDBHandlers() if (classMapRepository != null) Enumeration dbEnumeration = classMapRepository.getDBEnumeration() while (dbEnumeration.hasMoreElements C)) 63 String currentDB = (String)dbEnumeration.nextElement() ClassMap classMap = (ClassMap) classMapRepository.getClassMap(currentDB); String connectionUser = classMap.getUser(); String connectionPassword = classMap.getPassword() String connectionIP = classMap.getIP(); String connectionPort = classMap.getPort(); String connectionDBName = classMap.getDatabaseName); String connectionDBAlias = classMap.getDatabaseAlias(); String connectionINFORMIXSERVER = classMap.getAdditionalParameters(); //strip 'INFORMIXSERVER=' part of the string connectionINFORMIXSERVER = connectionINFORMIXSERVER.substring(connectionINFORMIXSERVER.indexOf ("=") + 1); InformixJDBCHandler currentHandler = new InformixJDBCHandler(false); write access currentHandler. setUser (connectionUser) currentHandler. setPassword(connectionPassword) currentHandler.setIP(connectionIP); currentHandler. setPort (connectionPort) currentHandler. setDB (connectionDBName); currentHandler. setINFORMIXSERVER (connectionINFORMIXSERVER); currentHandler.updateUrl(); remoteDBHandlerHashtable.put (connectionDBAlias, currentHandler) //no else System. out.println("ERROR! not set/initialized!"); DBDelegator. setupRemoveDBHandlers() ClassMapRepository public void processTableQueries() if ((localDBHandler (distributedQuery != != null) && (remoteDBHandlerHashtable.size() != 0) && null)) Enumeration monoDBKeysEnumeration = distributedQuery.getMonoDBKeys(); while (monoDBKeysEnumeration.hasMoreElements)) String currentMonoDBString = (String)monoDBKeysEnumeration.nextElement(); SQLMonoDBQuery currentMonoDBQuery = (SQLMonoDBQuery) distributedQuery. getMonoDBQuery (currentMonoDBString); if (currentMonoDBQuery == null) System.out.println("ERROR! in distributedQuery object!"); DBDelegator.processTableQueries(): No MonoDBQuery's else //grab a handle of monoDB Handler HERE String dbAlias = currentMonoDBQuery.getDatabaseName(); InformixJDBCHandler currentDBHandler = (InformixJDBCHandler) remoteDBHandlerHashtable.get (dbAlias) if (currentDBHandler == null) //bad dbAlias or no handler registered with that key System.out.println("ERROR! not have a InformixJDBCHandler!"); DBDelegator.processTableQueries): dbAlias does else Enumeration tableQueryKeysEnumeration = currentMonoDBQuery.getTableKeys(); while (tableQueryKeysEnumeration.hasMoreElements() String currentTableQueryString = (String) tableQueryKeysEnumeration.nextElement(); SQLTableQuery currentTableQuery = (SQLTableQuery) currentMonoDBQuery. getTableQuery (currentTableQueryString) 64 if (currentTableQuery == null) System.out.println("ERROR! DBDelegator.processTableQueries (): No TableQuery's in currentMonoDBQuery object!"); else String currentSQL = currentTableQuery.toSQLString(; try ResultSet currentResultSet = currentDBHandler.getResultSet(currentSQL); String localTableName = dbAlias + " currentTableQuery.getTableName(); //create table " + [DBAlias] _[TableName] //localDBHandler.insertResultSet(localTableName, currentResultSet); localDBHandler.insertTest(localTableName, currentResultSet); localTableNamesVector.addElement(localTableName); //keep track of the Table Names added to the localDB catch (SQLException e) System.out.println("ERROR! Problems processing DBDelegator.processTableQueries() SQL ERROR:"+e.getMessage H) else System. out.println error!"); ("ERROR! DBDelegator.processTableQueries(): uninitialized object public void dropLocalTables) try for (int i=O; i < localTableNamesVector.size(; i++) String currentTable = (String) localTableNamesVector.elementAt (i) localDBHandler.dropTable (currentTable) localTableNamesVector = new Vector)); catch (SQLException e) System.out.println("ERROR! ERROR:"+e.getMessage)); Problems processing DBDelegator.dropLocalTables(): SQL public String convertToAggregateQuery(String dbPathQuery) StringTokenizer stringTokenizer = new StringTokenizer(dbPathQuery, "->"); StringBuffer sb = new StringBuffer); String currentToken; while (stringTokenizer.hasMoreTokens() currentToken = stringTokenizer.nextToken() sb.append(currentToken); if (stringTokenizer.hasMoreTokens() sb.append("_"); //assumes with the name ' [DBPath]_[Table]' each [DBPath]->[Table] return sb.toString(); 65 will be stored in the localDB public ResultSet getAggregateQueryResultSet() try String aggregateQuery = convertToAggregateQuery (distributedQuery.getDBPathQueryString()) distributedQuery. setAggregateQuerystring (aggregateQuery) return localDBHandler.getResultSet (aggregateQuery) catch (SQLException e) System.out. println ("ERROR! Problems processing DBDelegator.getAggregateQueryResultSet(): SQL ERROR: "+e.getMessage()) return null; public String resultSetToString (ResultSet resultSet) StringBuffer sb = new StringBuffer(); try ResultSetMetaData metaData = resultSet.getMetaData(; for (int i=1; i <= metaData.getColumnCount(); i++) sb.append(metaData.getColumnName(i)); if (i == metaData.getColumnCounto) sb.append("\n"); //last line else sb.append("\t"); while for //tabbed out (resultSet.next() (int i=l; i <= metaData.getColumnCount(; i++) sb.append(resultSet.getString(i)); if (i == metaData.getColumnCounto) sb.append("\n"); //last line else sb.append("\t"); catch //tabbed out (SQLException e) System. out.println("ERROR! ERROR:"+e.getMessage)); Problems processing DBDelegator.resultSetToString(): return sb.toStringo); 66 SQL SQLJDBCHandler.java import java.sql.*; import com.informix.jdbc.*; import java.io.*; abstract class SQLJDBCHandler private String connectionUrl = null; private String connectionUser = null; private String connectionPassword = null; private private private private String String String String connectionIP = null; connectionPort = null; connectionDB = null; connectionINFORMIXSERVER = null; private boolean writeAccess = false; public abstract ResultSet getResultSet(String public void setParams (String url, String user, statementString) throws SQLException; String password) setUrl (url) setUser(user); setPassword(password); public abstract void updateUrl(); //updates the connectionUrl with the params public void setUrl(String url) connectionUrl = url; public String getUrl() return connectionUrl; public void setUser(String user) connectionUser = user; public String getUser() return connectionUser; public void setPassword(String password) connectionPassword = password; public String getPassword() return connectionPassword; public void setIP(String ipString) connectionIP = ipString; public String getIP() return connectionIP; 67 public void setPort(String port) connectionPort = port; public String getPort() return connectionPort; public void setDB(String DB) connectionDB = DB; public String getDB() return connectionDB; public void setINFORMIXSERVER(String server) connectionINFORMIXSERVER = server; public String getINFORMIXSERVER() return connectionINFORMIXSERVER; //***WRITE methods***// public abstract void insertResultSet(ResultSet resultSet) public abstract void releaseResultSetResources(ResultSet SQLException; public abstract SQLException; public throws SQLException; resultSet) throws String generateTableSQL(ResultSetMetaData resultSetMetaData) //returns SQL for recreating tables from a ResultSet abstract void copyResultSet(ResultSet throws SQLException; 68 origResultSet, ResultSet throws copyResultSet) InformixJDBCHandler.java import java.sql.*; import com.informix.jdbc.*; import java.io.*; public class InformixJDBCHandler implements SQLJDBCHandler private String connectionUrl = null; private String connectionUser = null; private String connectionPassword = null; private private private private String String String String connectionIP = null; connectionPort = null; connectionDB = null; connectionINFORMIXSERVER = null; private boolean writeAccess = false; public InformixJDBCHandler(boolean writeBoolean) this.writeAccess = writeBoolean; public ResultSet getResultSet (String statementString) throws SQLException (connectionUrl == null) if throw new SQLException("ERROR: else if null!"); (connectionUser == null) throw new SQLException("ERROR: else if Connection URL is Connection User is null!"); (connectionPassword == null) throw new SQLException("ERROR: Connection Password is null!"); String cmd = statementString; ResultSet resultSet = null; Connection conn = null; try Class. forName ("com. informix.jdbc. IfxDriver") //Load Informix JDBC driver catch (Exception e) throw new SQLException("ERROR: e.getMessage) + ")"); failed to load Informix JDBC driver." + "(" + try conn = DriverManager.getConnection(connectionUrl, connectionUser, connectionPassword); //Make the connection to the DB thru the URL authenicating with user/password catch (SQLException e) throw new SQLException("ERROR: failed to connect!" + "(" + e.getMessage() try Statement stmt = conn.createStatement(ResultSet.TYPESCROLLINSENSITIVE, ResultSet.CONCURUPDATABLE); resultSet = stmt.executeQuery(cmd); 69 + ")"); catch (SQLException e) execution failed throw new SQLException("ERROR: e.getMessage() + I')"); - statement:" + cmd + "(" + return resultSet; public void setParams (String url, String user, String password) setUrl(url); setUser(user); setPassword(password); public void updateUrl() connectionUrl = //updates the connectionUrl with the params "jdbc:informix-sqli://" connectionDB + ":" + + connectionPort + "/" + connectionIP + ":" + connectionINFORMIXSERVER; + "INFORMIXSERVER=" public void setUrl(String url) connectionUrl = url; public String getUrl() return connectionUrl; public void setUser(String user) connectionUser = user; public String getUser() return connectionUser; public void setPassword(String password) connectionPassword = password; public String getPassword() return connectionPassword; public void setIP(String ipString) connectionIP = ipString; public String getIP() return connectionIP; public void setPort(String port) connectionPort = port; public String getPort() 70 return connectionPort; public void setDB(String DB) connectionDB = DB; public String getDB() return connectionDB; public void setINFORMIXSERVER(String server) connectionINFORMIXSERVER = server; public String getINFORMIXSERVER() return connectionINFORMIXSERVER; //***WRITE methods***// public void insertResultSet(String tableName, ResultSet resultSet) throws SQLException (writeAccess == if false) throw new SQLException("ERROR: else if (connectionUrl == null) throw new SQLException("ERROR: else if Connection URL is null!"); (connectionUser == null) throw new SQLException("ERROR: else if DB not initialized for write access!"); Connection User is null!"); (connectionPassword == null) throw new SQLException("ERROR: Connection Password is null!"); Connection conn = null; try Class.forName("com.informix.jdbc.IfxDriver"); //Load Informix JDBC driver catch (Exception e) throw new SQLException("ERROR: e.getMessage() + ")"); failed to load Informix JDBC driver." + "C(" + try conn = DriverManager.getConnection(connectionUrl, connectionUser, connectionPassword); //Make the connection to the DB thru the URL authenicating with user/password catch (SQLException e) throw new SQLException("ERROR: failed to connect!" try Statement stmt = conn.createStatement(); 71 + "(" + e.getMessage() + ")"); String tableStatementString = generateTableSQL(tableName, resultSet); tablestatement = stmt.executeUpdate(tableStatementString); int //release the DB resources for the statement stmt.close(); catch (SQLException e) throw new SQLException("ERROR: execution failed - ResultSet Insert:" e.getMessageo) + ")"); + "(" + public void insertTest(String tableName, ResultSet resultSet) throws SQLException try insertResultSet(tableName, resultSet); ResultSet destinationResultSet = getResultSet("SELECT * FROM " + tableName); //grab a handle on the newly created table //MUST HAVE ResultSetMetaData metaData = destinationResultSet.getMetaData(); THIS LINE -- BUG IN INFORMIX JDBC driver!!! copyResultSet(resultSet, destinationResultSet); releaseResultSetResources(resultSet); releaseResultSetResources(destinationResultSet); catch (SQLException e) throw new SQLException("ERROR: execution failed - insertTest" + "(" e.getMessage() + I')"); + public void dropTable(String tableString) throws SQLException (writeAccess == if false) throw new SQLException("ERROR: DB not initialized for write access!"); else if (connectionUrl == null) throw new SQLException("ERROR: Connection URL is null!"); else if (connectionUser == null) throw new SQLException("ERROR: Connection User is null!"); else if (connectionPassword == null) throw new SQLException("ERROR: Connection Password is null!"); Connection conn = null; try Class.forName( "com.informix.jdbc.IfxDriver"); //Load Informix JDBC driver catch (Exception e) throw new SQLException("ERROR: e.getMessage() + ")"); failed to load Informix JDBC driver." + "(" + try conn = DriverManager.getConnection(connectionUrl, connectionUser, connectionPassword); //Make the connection to the DB thru the URL authenicating with user/password catch (SQLException e) 72 throw new SQLException("ERROR: failed to connect!" + e.getMessageo) + "(" + ")")- try Statement stmt = conn.createStatement(ResultSet.TYPESCROLLINSENSITIVE, ResultSet.CONCURUPDATABLE); stmt.execute("DROP TABLE "+tableString); stmt.close(); //release the DB resources for the statement catch (SQLException e) - DROP TABLE:" + throw new SQLException("ERROR: execution failed " (e.getSQLState()=" + e.getSQLStateo) + " e.getErrorCode(="+e.getErrorCode(+")"); public void releaseResultSetResources(ResultSet resultSet) throws SQLException Statement stmt = resultSet.getStatement(); resultSet.closeo); stmt.closeo); public String generateTableSQL (String tableName, ResultSet resultSet) SQLException //returns SQL for recreating tables from a ResultSet throws int columntype = 0; String columnNameString = ""; String columnTypeString = ""; StringBuffer sb = new StringBuffer(); try ResultSetMetaData resultSetMetaData = resultSet.getMetaData(); //map table name with some sb.append("CREATE TABLE "+ tableName + " ( "); significance int numberofcolumns = resultSetMetaData.getColumnCounto); for (int i=1; i <= numberofcolumns; i++) //columntype = resultSetMetaData.getColumnType(i); columnNameString = resultSetMetaData.getColumnName(i); columnTypeString = JDBCtoInformixType(resultSetMetaData.getColumnTypeName(i)); sb.append("\n "); " + columnTypeString); sb.append(columnNameString + " //getTypeName(column_type)); (columnTypeString.equalsIgnoreCase ("CHAR") if columnTypeString. equalsIgnoreCase ("VARCHAR") 1 columnTypeString.equalsIgnoreCase ("DECIMAL") columnTypeString. equalsIgnoreCase ("LONGVARCHAR") | size if the column is of the above types //insert sb.append(" (" + resultSetMetaData.getColumnDisplaySize(i) if + ")") (i < numberofcolumns) sb.append(","); sb.append("\n catch );"); (SQLException e) throw new SQLException("ERROR: execution column count:" + "(" + e.getMessage() + ")"); return sb.toStringo); 73 failed - ResultSetMetaData cannot get public void copyResultSet(ResultSet SQLException //Req: ResultSet copyResultSet) throws Tables column types must be the same and be indexed! ResultSetMetaData int type_int; while origResultSet, origMetaData = origResultSet.getMetaData() (origResultSet.next() == true) row // moves cursor to the insert copyResultSet.moveToInsertRow() //for each column in the i++) i=1; i <= origMetaData.getColumnCount(); for (int row type_int if = origMetaData.getColumnType(i) (typeint == Types.ARRAY) //CANNOT COPY WITH JDBC //throw Exception else if (typeint == Types.BIGINT) copyResultSet.updateInt(i, else if origResultSet.getInt(i)) (typeint == Types.BINARY) try InputStream tempInputStream = origResultSet.getBinaryStream(i); copyResultSet.updateBinaryStream(i, tempInputStream, tempInputStream.available() ); //check if tempInputStream.available() is correct catch (IOException e) throw new SQLException("ERROR: Problems processing InputStream in Query: e.getMessage)); else if (typeint == Types.BIT) //CANNOT COPY WITH JDBC //throw Exception else if (type_int == Types.BLOB) //CANNOT COPY WITH JDBC //throw Exception else if (typeint == Types.CHAR) copyResultSet.updateInt(i, else if origResultSet.getInt(i)); (type_int == Types.CLOB) //CANNOT COPY WITH JDBC //throw Exception else if (typeint == Types.DATE) copyResultSet.updateDate (i, else if origResultSet. getDate (i) (typeint == Types.DECIMAL) copyResultSet.updateBigDecimal (i, else if origResultSet. getBigDecimal (i)); (type_int == Types.DISTINCT) 74 + //CANNOT COPY WITH JDBC //throw Exception else if (type-int == Types.DOUBLE) copyResultSet.updateDouble (i, else if (typeint == Types.FLOAT) origResultSet.getFloat (i)) copyResultSet.updateFloat(i, else if (type_int == Types.INTEGER) copyResultSet.updateInt(i, else if origResultSet .getDouble (i)) (typeint origResultSet.getInt(i)) == Types.JAVAOBJECT) copyResultSet.updateObject (i, else if origResultSet. getObject (i)); (type_int == Types.LONGVARBINARY) try InputStream tempInputStream = origResultSet.getBinaryStream(i) copyResultSet.updateBinaryStream(i, tempInputStream, tempInputStream.available() ); catch (IOException e) throw new SQLException("ERROR: Problems processing InputStream in Query: e.getMessage)); else if (type_int == Types.LONGVARCHAR) copyResultSet.updateString(i, else if origResultSet.getString(i)) (type_int == Types.NULL) copyResultSet.updateNull(i); else if (type_int == Types.NUMERIC) //CANNOT COPY WITH JDBC //throw Exception else if (type_int == Types.OTHER) //CANNOT COPY WITH JDBC //throw Exception else if (typeint == Types.REAL) //CANNOT COPY WITH JDBC //throw Exception else if (typeint == Types.REF) //CANNOT COPY WITH JDBC //throw Exception else if (typeint == Types.SMALLINT) copyResultSet.updateInt(i, else if origResultSet.getInt(i)) (typeint == Types.STRUCT) //CANNOT COPY WITH JDBC //throw Exception 75 + else if (type int == Types.TIME) copyResultSet.updateTime(i, else if origResultSet.getTime(i)) (type-int == Types.TIMESTAMP) copyResultSet.updateTimestamp(i, else if (type-int == Types.TINYINT) copyResultSet.updateInt (i, else if origResultSet.getTimestamp (i)) origResultSet. getInt (i)) (type_int == Types.VARBINARY) try InputStream tempInputStream = origResultSet.getBinaryStream(i); tempInputStream, copyResultSet.updateBinaryStream(i, tempInputStream.available() ); catch (IOException e) throw new SQLException("ERROR: e.getMessage)); else if Problems processing InputStream in Query: (type_int == Types.VARCHAR) copyResultSet.updateString(i, origResultSet.getString(i)) copyResultSet.insertRow(); copyResultSet.moveToCurrentRow(); Connection conn = (copyResultSet.getStatement() .getConnection(; //mapping between JDBC Types and Informix Types at //www. informix.com/answers/english/docs/220sdk/jdbcl4/program. fm4. html http: // //JDBC API Data Type from java.sql.Types Corresponding Informix Data Type INT8 //BIGINT //BINARY BYTE //BIT Not supported CHAR(n) //CHAR DATE //DATE //DECIMAL DECIMAL FLOAT //DOUBLE SMALLFLOAT //FLOAT INTEGER //INTEGER BYTE //LONGVARBINARY TEXT //LONGVARCHAR //NUMERIC DECIMAL SMALLFLOAT //REAL SMALLINT //SMALLINT DATETIME //TIME DATETIME //TIMESTAMP SMALLINT //TINYINT //VARBINARY BYTE //VARCHAR VARCHAR(m,r) private String JDBCtoInformixType (String type) if (type.equalsIgnoreCase ("BIGINT")) return "INT8"; else if (type.equalsIgnoreCase("BINARY")) return "BYTE"; 76 + else if (type. equalsIgnoreCase ("BIT")) //NOT SUPPORTED! return else if return else if (type.equalsIgnoreCase("CHAR")) "CHAR"; (type.equalsIgnoreCase("DATE")) return "DATE"; else if (type.equalsIgnoreCase ("DECIMAL")) return "DECIMAL"; else if return else if (type.equalsIgnoreCase("DOUBLE")) "FLOAT"; (type.equalsIgnoreCase("FLOAT")) return "SMALLFLOAT"; else if type.equalsIgnoreCase("INT")) (type.equalsIgnoreCase("INTEGER") return "INTEGER"; else if (type.equalsIgnoreCase ("LONGVARBINARY")) return "BYTE"; else if (type. equalsIgnoreCase ("LONGVARCHAR")) return "TEXT"; else if (type.equalsIgnoreCase ("NUMERIC")) return "DECIMAL"; else if (type.equalsIgnoreCase("REAL")) return "SMALLFLOAT"; else if (type.equalsIgnoreCase("SMALLINT")) return "SMALLINT"; else if (type.equalsIgnoreCase ("TIME")) return "DATETIME"; else if (type.equalsIgnoreCase("TIMESTAMP")) return "DATETIME"; else if (type.equalsIgnoreCase("TINYINT")) return "SMALLINT"; else if return else if (type. equalsIgnoreCase ("VARBINARY")) "BYTE"; (type.equalsIgnoreCase("VARCHAR")) return "VARCHAR"; return 77 private String InformixtoJDBCType(String type) if (type.equalsIgnoreCase("INT8")) return "BIGINT"; else if return else if (type.equalsIgnoreCase("BYTE")) "BINARY"; (type. equalsIgnoreCase ("CHAR")) return "CHAR"; else if (type.equalsIgnoreCase("DATE")) return "DATE"; else if (type. equalsIgnoreCase ("DECIMAL")) return "DECIMAL"; else if (type.equalsIgnoreCase("FLOAT")) return "DOUBLE"; else if (type. equalsIgnoreCase ("SMALLFLOAT")) return "FLOAT"; else if type.equalsIgnoreCase("INT")) (type.equalsIgnoreCase("INTEGER") return "INTEGER"; else if (type.equalsIgnoreCase("TEXT")) return "LONGVARCHAR"; else if (type.equalsIgnoreCase ("SMALLFLOAT")) return "REAL"; else if (type.equalsIgnoreCase("SMALLINT")) return "SMALLINT"; else if (type.equalsIgnoreCase ("DATETIME")) return "TIME"; else if return (type.equalsIgnoreCase ("VARCHAR")) "VARCHAR"; return private String getTypeName(int typeint) if (type_int == Types.ARRAY) return "ARRAY"; else if (typeint == Types.BIGINT) return "BIGINT"; else if (typeint == Types.BINARY) 78 return else if "BINARY"; (type-int == Types.BIT) return "BIT"; else if (type_int == Types.BLOB) return "BLOB"; else if (type int == Types.CHAR) return "CHAR"; else if (type-int == Types.CLOB) return "CLOB"; else if (type-int == Types.DATE) return "DATE"; else if return else if (typeint Types.DECIMAL) "DECIMAL"; (type-int == Types.DISTINCT) return "DISTINCT"; else if (type-int == Types.DOUBLE) return "DOUBLE"; else if (type_int == Types.FLOAT) return "FLOAT"; else if (type-int == Types.INTEGER) return "INTEGER"; else if (type-int == Types.JAVAOBJECT) return "JAVAOBJECT"; else if (type_int == Types.LONGVARBINARY) return "LONGVARBINARY"; else if (typeint == Types.LONGVARCHAR) return "LONGVARCHAR"; else if (typeint == Types.NULL) return "NULL"; else if (type-int == Types.NUMERIC) return "NUMERIC"; else if (typeint == Types.OTHER) return "OTHER"; else if (typeint == Types.REAL) return "REAL"; 79 else if (type-int == Types.REF) return "REF"; else if return else if return else if (typeint == Types.SMALLINT) "SMALLINT"; (typeint == Types.STRUCT) "STRUCT"; (typeint == Types.TIME) return "TIME"; else if (type-int == Types.TIMESTAMP) return "TIMESTAMP"; else if (type_int == Types.TINYINT) return "TINYINT"; else if (type-int == Types.VARBINARY) return "VARBINARY"; else if (type-int == Types.VARCHAR) return "VARCHAR"; else return "" 80