Design of a Genetics Database for ... and the Human Genome Database by Benson Fu

Design of a Genetics Database for Gene Chips
and the Human Genome Database
by
Benson Fu
Submitted to the Department of Electrical Engineering and Computer Science
in partial fulfillment of the requirements for the degree of
Bachelor of Science in Electrical Engineering and Computer Science
and Masters of Engineering in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
May 22, 2001
@ 2001 Massachusetts Institute of Technology
All rights reserved
..
A u tho r...............................................................................................
Department of Electrical Engineering and Computer Science
May 22, 2001
Certified by....................
.....................
r.
..
7*TForbes'Dewey, *Jr.
Professor
Thesis Supervisor
.........................
Arthur C. Smith
Chairman, Department Committee on Graduate Students
A ccep ted b y .....................................................
..
BARKER
OF TECHNOLOGY
JUL 1 1 2001
LIBRARIES
Design of a Genetics Database for Gene Chips
and the Human Genome Database
by
Benson Fu
Submitted to the Department of Electrical Engineering and Computer Science
on May 22, 2001 in partial fulfillment of the requirements for the degree of
Bachelor of Science in Electrical Engineering and Computer Science
and Masters of Engineering in Electrical Engineering and Computer Science
Abstract
Human medical research has traditionally been limited to the analysis of disease
symptoms. Although this research has produced many advancements in the medical
field, the availability of human genetic sequence data will lead to further advances in
diagnosis and treatment. With new sequencing technology and the near-completion of
the Human Genome Project, the situation is rapidly changing. We have designed a
database federation platform that manages gene chip experimental information and
genetic data from the Genome Project. The combination of both sources will provide a
The integration of
powerful information system for medical research purposes.
Affymetrix gene chip data and a schema of the Human Genome was used to test the
design.
Keywords: Human Genome Project, Affymetrix, GATC, genetic databases, gene chips,
database federation, federating databases, query mediation, heterogenous databases
Thesis Supervisor:
Title:
C. Forbes Dewey
Professor of Medical Engineering and Bioengineering
2
Contents
INTR O DU CTIO N ............................................................................................................. 5
TERM IN O LO G Y ............................................................................................................. 6
L BA CKG R O U N D ............................................................................................................ 7
A . Past Projects ................................................................................................................ 7
B. Current Projects .......................................................................................................... 8
C. The Problem at H and .................................................................................................. 9
Hum an Genom e D atabase ............................................................................................ 9
GATC D atabase ......................................................................................................... 10
Querying Both D atabases ........................................................................................... 10
11. DESIG N GO ALS ....................................................................................................... 12
111. TECHNOLOGY USED IN THE FEDERATION PLATFORM ......................... 15
A . Storage and Processing with a Local D atabase ........................................................ 15
Latency and Throughput ............................................................................................ 15
Storage M anagem ent and Scalability ......................................................................... 15
Efficient Query Processing ......................................................................................... 16
Future Benefits ........................................................................................................... 16
B. Interface and Transport w ith JDBC .......................................................................... 16
Simplicity and V ersatility ........................................................................................... 17
Object-Relational Support .......................................................................................... 18
ODBC Comparison .................................................................................................... 18
Current Im plem entation ............................................................................................. 18
C. ClassM apper Concept ............................................................................................... 19
IV. THE FEDERATION PLATFORM DESIGN ........................................................ 20
Starting the Federation Platform .................................................................................... 20
H ow a Query Is Structured ............................................................................................ 20
When a Query Is Subm itted ........................................................................................... 21
V . ARCH ITECTU RE ..................................................................................................... 23
ClassM apperRepository ................................................................................................. 23
DistributedQ uery (D ata Structure) ................................................................................ 24
QueryD ecom poser ......................................................................................................... 25
SQLQueryParser ............................................................................................................ 26
DBD elegator .................................................................................................................. 28
JDBCH andler ................................................................................................................. 30
V I. IM PLEMEN TA TIO N .............................................................................................. 32
ClassM aps ...................................................................................................................... 32
V II. DISCU SSIO N .......................................................................................................... 32
B u g s ............................................................................................................................... 3 2
M alfon-ned ClassM ap files ......................................................................................... 32
StringTokenizer Bug .................................................................................................. 32
Large D ata Sets .......................................................................................................... 32
Dropping Tables ......................................................................................................... 33
Future Improvem ents ..................................................................................................... 33
Threading capabilities ................................................................................................ 33
S ec u rity ....................................................................................................................... 3 3
Query Optim ization .................................................................................................... 33
Deploying the Federation Platform ............................................................................... 33
BIBLIO GR APHY ........................................................................................................... 34
APPENDIX ...................................................................................................................... 37
FederationPlatfonnjava ................................................................................................ 37
ClassM apRepository.j ava .............................................................................................. 38
ClassM ap.java ................................................................................................................ 41
DistributedQueryjava .................................................................................................... 43
SQLM onoDBQuery.java ............................................................................................... 46
SQLTableQuery.j'ava ..................................................................................................... 47
QueryD ecomposerjava ................................................................................................. 51
SQLQueryParserjava .................................................................................................... 55
DBDelegatorjava .......................................................................................................... 63
SQLJDBCHandlerjava ................................................................................................. 67
InfonnixJDBCHandlerjava ........................................................................................... 69
Introduction
The Human Genome project has expanded the horizons of both the biological and
medical communities, the latter of which is the ultimate consumer of these advances.
Medical research into human diseases has been mostly based on the analysis of
symptoms, and more recently, the use of genetic sequences. Until several years ago
sequencing was a prohibitively expensive endeavor. With current technological advances
and the huge push of the Human Genome Project, it appears that the relevant sections
will be sequenced within the next year. This wealth of data can be used for medical
research, but the raw data must be organized into a coherent schema-one which links it
to relevant information.
This thesis proposes an application design for handling Affymetrix Gene Chip
databases and the Human Genome database (HGDB). This application can be used to
access genetic data from a gene chip database and the Human Genome database as if both
were combined into a single database. The application uses a query-mediated approach
to create a database federation where both databases remain autonomous. The concept of
the ClassMapper was implemented into the system to provide descriptions of the
underlying databases. As a proof-of-concept, sample data contributed by the Sorger Lab
at MIT was used. The creation of this program will allow researchers to link their
experimental data with the information held within the Human Genome database.
The Human Genome database is essentially a large distributed work-in-progress
effort that acts as an "encyclopedia" for information about which genes are related, how
they are related, where the gene is located, what research has been done for each gene,
and other related information. Gene chips and DNA microarrays, on the other hand, are
the commercial tools for high volume genetics testing of mRNA samples. The two are
related in that they deal with genomic information. While one describes records of
clinical DNA test the other describes biological behaviors of the DNA. To be able to
leverage the information from both, a system must be able to seamlessly access the data
contained in both databases.
The key benefit of implementing a system that can interpret the data from gene
chips in conjunction with the Human Genome Database is that cross-realm queries are
then possible. In the case of Affymetrix gene chips, the results are output to a database
containing experimental data. This database was designed for experiment analyses and
thus contains limited information. The Human Genome Database was designed for
accumulating and distributing genetic data. Being able to tie the two together would
enable the user to make queries that allow data-mining across the two domains. This
would be especially useful since it would allow the user to easily perform compound and
complex queries to obtain information that is not contained in the gene chip database
itself.
This document investigates a database federation approach to enable cross-realm
queries. An application, referenced as the federation platform in this document, was
implemented as a proof-of-concept to handle the gene chip database and the HGDB. The
federation platform was written in Java and several new technologies were implemented
to align the application with its design goals that are mentioned later in this document.
5
Terminology
This document uses certain terms that have different meanings from document to
document. To clarify the way they are used later in this paper, the following terms are
defined.
Aggregate query - The query used after all of the data is aggregated on the local
database.
Database federation - A system that accesses heterogeneous databases into a loosely
coupled manner. For most database federations, site autonomy is preserved.
Distributed database - A system that accesses homogeneous databases in a tightly
coupled manner. Distributed databases usually have limited site autonomy.
Currently, several major database vendors support distributed databases.
Data warehousing - The concept of storing information from data sources or databases
into a central repository. This repository is then used as the access point for
retrieving information.
DBMS - DataBase Management System.
DBPath - A qualifier used in the query syntax to indicate in which database the table
resides. The syntax for a DBPath is [DatabaseName]->[TableName]. More
information about DBPaths is mentioned later in this document.
End-database - An individual database that is contained in the database federation.
Federated query - A query sent to the federation platform that might access multiple
databases.
Federation Platform - The federated database system that this document describes.
Local Database - The database on the server or intranet used to store tables from enddatabases.
Multidatabase system - A system that programmatically accesses multiple databases.
6
I. Background
Prior to designing the system, many multidatabase system designs were
investigated. Pre-existing databases, existing software, existing hardware, user
requirements, and bandwidth requirements are the determining factors in deciding which
system is optimal. The table below shows the distinctions among different multidatabase
systems. The classification of each system is based according to how closely the global
system integrates with the local database management system.
Global system has
Local nodes
Means of global
access to ...
typically are ...
information
Distributed
database
Internal DBMS
functions
Homogenous
databases
Global-schema
multidatabase
Federated database
Multidatabase
language system
Homogeneous
multidatabase
DBMS user
interface
DBMS user
interface
DBMS user
interface
DBMS user
interface + some
Heterogeneous
databases
Heterogeneous
databases
Heterogeneous
databases
Homogenous
databases
Global name
space; global
schema
Global schema
language
internal functions
Type of system
Tightly Coupled
Loosely coupled
system
Interoperable
system
Application on top
of the DBMS
Any data source
that meets the
Partial global
schemas
Access language
functions
Access language
functions
Data exchange
interface protocol
Table 1. Taxonomy of information-sharing systems.
After the investigation of multidatabase systems, it was decided that the federated
database system design was to be used. While making it appear as if all of the enddatabases are merged into one, a database federation keeps each end-database
autonomous such that they are affected as little as possible. In addition, the nature of the
database federation allows for heterogeneity among its end-databases-an important
benefit when dealing with biological databases. The concept of the ClassMapper [See
Technology Used section later] was also investigated since its utilization aids in
homogenizing various heterogeneous data sources. Since many biological databases are
very heterogeneous, this was an important issue for the system design.
Past and present projects in the field of federated database systems were
researched. Specifically, the historical aspects of past projects and the designs of many
current systems were studied. The tradeoffs for each system helped in determining the
design of the system this document describes. The major projects that were most relevant
to the problem at hand are discussed below.
A. Past Projects
During 1994, there were many ongoing projects involved in global-schema
multidatabase systems and federated database projects. At the time, each global-schema
multidatabase project was either in the research or prototype stage. Many of these
projects have since vanished. Of the federated database projects that were in existence
7
during 1994, many seem to have disappeared as well. Two noteworthy examples are
discussed below.
Mermaid, a global-schema multidatabase prototype made by Unisys, showed
great promise in the late 1980's [38]. Mermaid's hope was to become a front-end to
distributed heterogeneous databases. The plans were to allow the user of multiple
databases stored under various relational DBMS's to manipulate data using SQL or
ARIEL (ARIEL is a proprietary query language). The complexity of the distributed,
heterogeneous data processing was to be transparent to the user. Mermaid's main
emphasis was in query-processing performance: the internal language DIL (Distributed
Intermediate Language) was optimized for interdatabase processing. Mermaid evolved
into Inter Viso, a commercial product sold by Data Integration, Inc. Ultimately the
commercial product was discontinued and little information is known about its last
developments.
Pegasus was designed as a federated object-oriented multidatabase at the HewlettPackard Laboratories [38]. The attempt was to become a full DMBS that could integrate
heterogeneous, remote databases. The hope was to have global users add remote
schemas to be imported into the Pegasus database, thus making it a dynamic federated
database. Nonobject-oriented schemas were mapped to object-oriented representations
within the global database. The global access language HOSQL (Heterogeneous Object
SQL) had features of a multidatabase language system; however, local users were
responsible for integrating imported schemas. Although the system gained quite a bit of
publicity, Hewlett-Packard eventually discontinued its work on Pegasus. There is little
documentation as to why HP stopped further development, but it is known that its
publications ceased in 1993 when the system was still in its research phase.
B. Current Projects
One can only speculated as to why the former systems stopped being developed.
Perhaps it was more difficult than the companies first anticipated to create a generalized
federated system. It is also possible that the companies just shifted their focus away from
federated database systems. Regardless, the problem to conquer database heterogeneity
still exists today. Current federated database systems in the biological realm are still
being developed. Several ongoing projects that are tackling the same type of problem
that the federation platform is dealing with are as follows.
One tool was built by several researchers at the University of Pennsylvania in
Philadelphia [41]. These researchers in the Kleisli Project built the tool that allowed
scientists to use a single query interface to compare their data against a variety of
collections. Kleisli currently is a tool for the broad-scale integration of databanks that
supposedly offers "flexible access to biological sources that are highly heterogeneous,
geographically scattered, highly complex, constantly evolving, and high in volume". The
tool does handle a wide variety of data sources but at the expense of ease-of-use. Since
the system was meant to handle nearly any type of data source, the query language of the
system is very complicated and difficult to use. The system overcomes heterogeneity by
expanding the language set for each different data source. Obviously, the language set
becomes more complicated as more data sources are supported. This is where the
ClassMapper concept could come into play to condense the query language by
homogenizing the databases or data sources. As a last note, the Kleisli product was
8
continued into the commercial world as the product named "gX-Engine" that is now
owned by the company GeneticXchange Inc. [42].
Another company, LION Bioscience AG in Heidelberg, Germany, markets a tool
called SRS [43]. This tool helps the integration of databases for many pharmaceutical
firms. The system handles quite a variety of databases, however, attaching an new
database requires a decent amount of work. Each database type added to the system must
have a specialized interface that must be programmed into SRS. Again, to overcome this
non-generalizable approach is where the ClassMapper concept would come into use.
Transforming a number of heterogeneous databases into a set of homogeneous databases
would allow the federated system to manage its data without the hassle of being limited
by the interfaces of the individual databases. Having a ClassMapper for each database
could provide this homogeneity.
MARGBench is a system which enables querying several databases in SQL by
translating SQL queries into a source database specific interface [24] [44]. Developed at
the Otto-von-Guericke-University in Magdeburg, Germany, MARGBench is a database
federation that simultaneously queries biological source databases online. The
architecture of the system is similar to the architecture of the federation platform in many
ways. MARGBench is a database federation that has a SQL interface, uses the concept
of Adapters instead of Handlers (mentioned in the paper), takes advantage of JDBC
connections to end-databases, and is able to make cross-realm queries across a number of
heterogeneous databases. The system even has a local database to handle caching of the
data. Where MARGBench and the federation platform differ is in the way the enddatabase table information is revealed. In the federation platform, ClassMaps reveal the
table information of the end-databases before queries are handled. In MARGBench, the
concept of an ontology is used. The ontology is effectively a list of connections that link
the data between the end-databases. While this concept is useful in connecting data, it
does not help to overcome heterogeneity issues. Custom-made adapters must still be
built for each type of database in its federation. Similar to the federation systems
previously mentioned, this non-generalizable approach does not scale well when there is
a large amount of heterogeneity. For the federation platform, the evolution of the
ClassMapper will eventually consolidate database communication into a single interface
no matter how heterogeneous the underlying databases are.
C. The Problem at Hand
Human Genome Database
The Human Genome Database (HGDB) has literally terabytes of information that
includes genetic sequences and related metadata. When the Human Genome database
was first designed, it made sense to order the data in an object-oriented fashion. Because
the nature of the data had fixed associations such that genome data could be treated as
objects, the system was built to handle genomic segments as objects that contained
names, descriptions, and associated links to other objects. In addition, the requirement of
managing such large amounts of data lent itself to an object-oriented design which is
relatively scalable.
The information contained in the Human Genome Database can be broken down
into three main object types. The types are as follows:
9
*
Regions of the human genome, including genes, clones, amplimers (PCR
markers), breakpoints, cytogenetic markers, fragile sites, ESTs, syndromic
regions, contigs and repeats.
"
Maps of the human genome, including cytogenetic maps, linkage maps, radiation
hybrid maps, content contig maps, and integrated maps. These maps can be
displayed graphically via the Web.
*
Variations within the human genome including mutations and polymorphisms,
plus allele frequency data.
The database contains a huge wealth of information that can be used to reveal
large amounts of genomic information about a gene sequence or gene fragment.
However, the interface to the database is designed for inserting and extracting the data,
not for data mining or complex querying. Thus the information contained in the Human
Genome Database is not being fully utilized to its potential.
William Chuang's work with the HGDB was used as an example database in the
federation platform. In his project, an object-relational implementation and ClassMap for
the HGDB was created. The schema was implemented as an end-database in the
federation and the ClassMap was extended to also describe connectivity information.
GA TC Database
The companies Affymetrix and Molecular Dynamics teamed up to form the
Genetic Analysis Technology Consortium (GATC) to build a platform to design, process,
read and analyze DNA-chip arrays [10]. One product that came out of the GATC was a
specification for a database to handle the data of DNA-chip experiments. The
information contained in the GATC databases are recorded intensities that correspond to
the amount of targeted DNA that the sample has. These DNA targets correspond to
specific DNA segments that are characterized by certain biological behaviors. For more
information about how DNA-chip arrays work, see [45].
The GATC database specification has a basic relational architecture that is geared
to store experimental data. In William Chuang's work, an object-relational
implementation of the GATC database was created. It was decided that the database
needed to be object-relational to give researchers the ability to easily import their
experimental data directly into the new ORDBMS without requiring additional
messaging.
William Chuang's GATC database was used as an additional example database in
the federation platform. Its schema was implemented as an end-database in the
federation and the ClassMap was extended to also describe connectivity information.
Querying Both Databases
To truly leverage the experimental data in the GATC database and the genomic
characteristic information in the HGDB, the two must be utilized together. The DNA
identifiers (called AccessionID's) in the GATC database correspond to DNA segments
with genomic characteristics. However, these characteristics are stored in the tables and
connections of the HGDB. In order to associate a particular DNA segment from an
experiment with its genomic characteristics, both databases must be used in conjunction
10
with each other. The problem then becomes the task of querying data across two
database domains.
Merging the two databases into a feasible solution to the problem, but this is not
necessarily the best route. Updating the table information from both sources can be a
cumbersome task especially if there is no mechanism to tell whether a table needs to be
updated. Plus, the HGDB literally contains terabytes of information. Managing massive
amounts of data could be a challenging task in and of itself.
The federation platform that this document describes is a solution to this
problem. The two databases are left autonomous as end-databases of the system. The
information retrieved from both are performed ad hoc so that the data are fresh. The
system also allows cross-queries across both domains without having to merge the two
schemas. The next section describes the design of the federation platform.
11
II. Design Goals
The architecture of the system was designed to be a federated platform with the
following design goals in mind.
Data Freshness
Since biological databases are constantly being updated, having the most up-todate information is often times important to the work of the researcher. Working against
old data can sometimes mean the difference between a success and a failure of an
experiment. As mentioned before, data warehousing has many of its own advantages, but
it lacks freshness from its databases since its data fetching is not performed when tables
are accessed by the system. In the proposed architecture, the user of the system is
guaranteed fresh data since all of the data fetching is performed on an ad hoc basis.
The tradeoff of this design goal is that if a database in the group is down, the
query will fail [See Figure 2]. In addition, without optimizations, the guarantee of fresh
data comes with the sacrifice of speed, especially if the network connection is at a low
speed.
Site Autonomy
Many of the existing biological databases were designed to serve the purpose of
receiving and hosting biological information. For nearly all of these databases, the
database structures were not intended to be changed. Thus, to modify the underlying
structure of each database, a great amount of work would have to be done. What the
federation platform allows is site autonomy of existing databases. That is, the platform
does not require modifications to the end-databases for them to be used in the system.
The federation platform only requires that it have query access to the end-databases.
In addition to not touching the underlying structure, the database federation
requires no special maintenance at its end-databases. This is especially useful since most
biological databases are maintained by specialized groups that do not have the time or
resources to make modifications for non-critical components of their system. Again, the
federation architecture allows for site autonomy that is often times required for adding
certain databases.
Flexibility/Expandability
To be able to handle additional end-databases with different means of
connectivity or querying interfaces, the system must have a flexible and expandable
architecture. This thesis seeks a partial solution with an expandable architecture.
Since the architecture was designed so that a new handler object is instantiated for
each database registered in the federation, the system can simultaneously use different
database interfaces. This means that as new database interfaces are created for the
platform, old ones do not have to be upgraded or sacrificed since all can be used
concurrently. This functionality allows future support for a large variety of databases in
the database federation.
Ideally, the system will increase its expandability with the evolution of the
ClassMapper concept. The current system utilizes the ClassMapper concept with
ClassMaps of each database. As mentioned in the Technology Used section, the
12
ClassMapper concept hopes to reduce heterogeneity by providing the database or data
source with a homogeneous presentation to the outside world. An evolved ClassMapper
would provide this to allow universal flexibility and expandability for practically all
systems that access multiple databases.
Scalability
In order to completely leverage the power of a database federation, the system
must be able to support many databases simultaneously. If it is the case that only a small
number of databases can be queried against during a single federated query, then the
utility of the system drops dramatically. Especially in the realm of biological research,
multiple databases must be used at the same time or otherwise data could be incomplete.
The design of the federation platform theoretically scales to an unlimited number of enddatabases. This is primarily because of how the local database is used.
As is mentioned later in the thesis [See The Federation Platform Design], when
a federated query is submitted, the federation platform copies the vital data of the
accessed tables to local database. This is performed one table at a time until all required
table information is transferred to the local database. Once all of the information is
aggregated, the federation platform finally runs a query on the local database. The results
returned are returned to the user.
In some sense, the local database effectively acts as a buffer between the enddatabases and the user [See Figure 2]. By looking at the system in this framework, the
table information from the end-databases is collected in local database until all of the
required data is transferred. Each transaction of copying a partial table from an enddatabase to the local database is done separately so that transfers do not consume large
amounts of system resources. The system can delegate how much resources can be used
for copying to ensure that the system does not become overloaded. Even if there is a
large number of tables that need to be copied, the system can transfer the data as fast or
as slow as it possibly can, depending on the amount of system resources available. Once
all of the tables are inserted into the local database, the local database can be queried for
the results to be returned back to the user.
This approach allows the system to scale to a relatively limitless number of table
transfers since the local database is used as a buffer for table information. The tables are
added piece by piece into the local database until all of the tables are collected. In
addition, since the platform can regulate the transfers by system resources, the size of
transactions does not matter. Ultimately, this design allows the system to scale as more
tables and databases are added to the database federation.
Transparency
Transparency of the accesses to end-databases is important because the potential
complexity of the database federation. Since the users of the system could get confused
handling the database operations, all database accesses were managed by the federation
platform. The targeted users of the system are researchers who may have little or no
experience with database management. By hiding this from the users, the database
federation appears to act as one large, single database to the user. This transparency adds
to the ease-of-use of the entire system.
13
In addition to the added ease-of-use benefit, hiding the transactions of the
underlying databases increases the overall security of the system. If the transactions of
the end-database were observable, potential hackers could trace extra data about each
end-database. This extra data could contain information that could help the hacker
discover the location of the database and exploit holes in the system. This transparency
helps to avoid security problems by removing the user from the entire transaction
process.
Portability
Since the federation platform was written in Java and uses JDBC for connectivity
to the local database, it can be run on any up-to-date Java VM. With the support of Java
VMs on MacOS, many flavors of Unix, and Windows operating systems, the platform
can be run on a wide variety of machines with practically no modifications to the code.
In addition, JDBC is platform neutral as well, thus requiring no main connectivity
changes in the code.
Although JDBC allows the server and application to be on different platforms,
Informix is also written for a variety of operating systems as well. This also increases
portability of the system since even the local database can easily be ported to different
machines.
Usability
Because many of the database federation system attempt to accommodate for so
many heterogeneous data sources, the querying language for the system is difficult to
learn and use. Because researchers who use biological databases do not usually come
from a strong computer science background, they find learning a new computer language
often times very daunting and difficult.
This federation platform is rather useable when compared against other systems
that use a cumbersome querying language. This is because queries in the federation
platform are similar to standard SQL queries in that they follow a "SELECT-FROMWHERE" clause format. By using queries similar to SQL, users who already know SQL
can immediately begin using the federation platform since they are familiar with how
queries are formed. For those users who are not familiar with the SQL querying
language, it can be learned very quickly since it has a low learning curve. Overall, the
design decision to use this querying language format makes the system more useable to
those who are unfamiliar with standard SQL as well as those who are.
14
III. Technology Used in the Federation Platform
A. Storage and Processing with a Local Database
To utilize the functionality built into database management systems, a decision
was made to implement a localized database in the system. This was done for several
improvements in design and performance. They are as follows:
Latency and Throughput
By storing the table information on a system near the federated platform, the table
information can be obtained with a low latency (low access time) and a high throughput
(large bandwidth). Ideally, the database will run on the same server as the federation
platform so that network interfaces will not hinder the overall performance of the system.
However, even if the database resides on another server in the intranet, current network
speeds of 1 ObaseT or 1 00baseT (a maximum of 1.2MB/sec and 12MB/sec theoretical
throughputs, respectively) are adequate to serve the information flowing between the
database and the platform effectively.
If caching of the tables is used in future implementations, then the federation
platform must access the data multiple times. In order to make the remote database tables
available to the system for multiple accesses, the data needs to be stored in a location
where it can be accessed quickly. Accesses to a "local" copy of the tables will reduce the
amount of time it takes to process a federated query since the data would not need to be
fetched again. For a caching system in the future, it would be ideal to cache large
amounts of data in the local database and have the federation platform check its freshness
before it is retrieved from the cache. This may be necessary, especially for the Human
Genome Database and other large databases, where the lack of a caching component
would potentially require gigabyte-sized fetches.
Storage Management and Scalability
The fetched table data from separate end-databases must be stored before they are
processed and sent back to the user. If the tables are stored as JDBC Java objects, large
amounts of memory are consumed unless the objects are written to disk. However, even
if the tables are written to disk, to efficiently process the table information, all table
objects must be loaded into the memory of the system. Thus, using a local database helps
to overcome these problems in managing the database information. Utilizing a local
database simplifies the information management and efficiently handles storage since
databases are built with these goals in mind.
Storage scalability is another advantage that comes when using a local database.
The databases and system resources of today can handle information sizes that are on the
order of many gigabytes. Storing large amounts of data from end-database tables is not a
problem with a local database. If the user of the federation platform anticipates that a
query will require huge amounts of table data to be accessed, then the space of the local
database can be adjusted accordingly.
In addition, the local database provides the added benefits of security, recovery,
and data integrity that are already built into the Database Management System (DBMS).
Although these features are not part of the design goals, they could come into use for
future goals of the federation platform.
15
Efficient Query Processing
Before results can be sent back to the user, tables from separate end-database
must be fetched and processed according to the conditions of the query. Since it is partial
table results that are returned from the end-databases, it seems natural to store these
tables in a database and process the information from there. Building a module that could
process the data based on the SQL conditions of a federated query would basically be
reinventing the query engine of a standard database management system. Therefore, it
was decided that it would be more efficient to use the query engine of a database instead
of a self-made data processing module. The most practical way of using a pre-existing
query engine was to implement a local database into the system. By adopting the query
engine of the local database, the processing capabilities of the federation platform
became as scalable and efficient as the local database itself.
Future Benefits
To deal with more complex queries in the future, the federation platform can use
the built-in functionality of the local database to aid in processing. For instance, many
databases have object-relational schemas. To be able to have these schemas be supported
by the federation platform, the system must be able to handle obj ect-relational
processing. In future versions, the system could be tailored to support object-relational
operations by using the preexisting processing capabilities of the local (object-relational)
database.
With the local database, caching of table data is a feature that could be added to
the federation platform. Table data fetched from end-databases could be stored in the
local database and reused until the information expires. The caching scheme would have
to incorporate a time stamp to calculate the "freshness" of the data in the tables since
there would be no guarantee that the cached data would be up-to-date. Further details
would be determined later when this functionality is implemented into the system.
Regardless, having the local database could tremendously reduce the amount of code
needed to add a caching feature.
B. Interface and Transport with JDBC
Java DataBase Connectivity (JDBC) has recently emerged as a growing standard
for database connectivity. With the explosion in adopters of Java, Java-based standards
have emerged. What JDBC provides Java developers is a standard API that is used to
access databases, regardless of the driver and database product. That means that any Java
application can connect to nearly any database no matter what platform the application
runs on or where the database resides. This is possible in part by the acceptance of the
JDBC standard by both Sun, the creators of Java, and all major database vendors
including Oracle, Informix, and IBM. This portability makes JDBC ideal for applications
that need to access databases over networks. For the design of this system, JDBC is used
for database connectivity for the end-databases as well as the local database. This
decision was based on the following advantages that JDBC provides:
16
Fgr
1.
cle
t
driver
o nriJDBC
Simplicity
Sever
internet
Client
ed:
authentication
corFr b eto ttheh onnc
Aatio
databas
s
JDBC support
Figure 1. A client connecting to a database via JDBC.
Simplicity and Versatility
A large amount of value is gained by using JDBC because of the simplicity of
data access and data manipulation. In JDBC, the lower level connectivity layers are
hidden from the developer to allow easier data access. The user has to only specify a
valid TCP/IP location of the server running the database and the correct authentication
for a connection. Figure 1 above demonstrates how a client machine connects to a
database via JDBC.
During JDBC operations, database queries are returned as ResultSet Java objects
from the java.sql package. A single ResultSet object contains the complete information
that is normally returned from the query. The data contained in the object can then be
accessed programmatically by traversing through the rows and columns. Field types and
values are easily extracted and turned into Java objects or primitives through the standard
methods in the ResultSet object. JDBC allows scrolling through the ResultSet object so
that the programmer can quickly jump to any row, column, or field. Even inserting data is
made easy by using methods in the package that allow for inserting rows
programmatically (as opposed to sending database-specific text insert statement to the
database). Support for batch updates is another key feature that makes JDBC very
attractive.
17
Object-RelationalSupport
JDBC also goes beyond relational processing. The movement for better objectrelational databases connectivity has pushed JDBC to support object-relational features.
The support for various data types already exists with limitations on handling large
objects, however, the scalability and the expanding number of supported data types will
make JDBC the preferred approach to handling object-relational data accesses. As
mentioned in previous sections, strong object-relational support is key for biological
databases since much of the information associated is metadata that must be stored as
objects. JDBC's continuing support to move this direction will ultimately help in being a
tool to manage biological information.
ODBC Comparison
Currently, other forms of connectivity such as Open DataBase Connectivity
(ODBC) and ODBC <-> JDBC bridges still exist since they are connectivity standards
still used today. However, the major database vendors have recognized the growing
demand for Java applications and have shifted their focus from developing ODBC to
JDBC. With this progression into a more Java-centric slant, ODBC is slowly getting
outdated.
Since ODBC is accessed via C or C++, client software must be written in C or
C++. While this is appropriate for many scenarios (where the point of database access is
a C or C++ program), to use Java with a database that only connects via ODBC, an
ODBC <-> JDBC bridge has to be used. While this conversion mechanism works for
most cases, the bridge increases the number of inter-operating parts and potential sources
of failure for the system. When a Java program is being used to communicate with a
database, it is best to use pure JDBC.
CurrentImplementation
Informix uses a type 4 JDBC driver. A type 4 driver is a pure Java driver that
uses a native protocol to convert JDBC calls into the database server network protocol.
Using this type of driver, the Java application can make direct calls from a Java client to
the database. A type 4 driver, such as Informix JDBC Driver, is typically offered by the
database vendor. Because the driver is written purely in Java, it requires no configuration
on the client machine other than telling the application where to find the driver. Once the
driver is loaded, the application can access the database via the JDBC interfaces.
In the Federation platform implementation proposed in this document, JDBC
database adapters were constructed as part of the platform. The reading adapter interface
that was created fetches database information via JDBC and passes the table information
back to the platform. For the local database that is located on the intranet (for low
latency and high bandwidth), an additional JDBC adapter was made with database
writing capabilities. Using JDBC for reading from the database federation and writing to
the intranet database allows for very clean and compact code for table transfers. (See
Appendices SQLJDBCHandler.javaand InformixJDBCHandler.java).
18
C. ClassMapper Concept
In the world of biological databases, the databases and data sources are very
heterogeneous. The way the systems are accessed range in different interface types and
their underlying database structures vary greatly as well. A query for one database may
have a completely different syntax or semantic structure than a query for another. Many
of these biological databases have query languages that were designed without the goal of
using pre-existing semantics. Therefore, many of these databases have very different
interfaces. In order to access a variety of these databases, the user must learn how to use
each one of them. In addition, if a programmer wishes to build an application that
accesses the databases, he must build a special interface for each heterogeneous database.
Thus, there is a need for standards when it comes to biological data sources [22].
Patrick McCormick's document [2] details the concept of a ClassMapper. The
main motivation for this concept is to conquer the heterogeneity of data sources that is so
prevalent across medical and biological databases. A ClassMapper is an application that
"sits on top" of a database (or data source) to standardize its presentation to the outside
world. All communication between the ClassMapper and the database are hidden since
the ClassMapper serves all information requests from the user. The added benefit is that
the user can interact with each ClassMapper in the same way since the interface is
standardized across every ClassMapper. Therefore, in some sense, each database "looks
the same" to the user since obtaining information is performed in the same manner. With
a ClassMapper residing on each database, all appear to be homogeneous. This concept is
still being refined; however, it is apparent that standards need to be put in place to
overcome heterogeneity.
The concept of the ClassMapper was used as part of the federation platform.
Since the standards for ClassMappers are still undefined for the most part, descriptions of
the HGDB and the GATC database were used. These descriptions are called ClassMaps
since they are standardized descriptions of the databases. These ClassMaps were
obtained from William Chuang's work described in [21]. In this implementation, the
ClassMaps were extended to include connectivity information to the end-databases. They
were used by the federation platform to build a map of the tables contained in each
database.
19
IV. The Federation Platform Design
Starting the Federation Platform
Before the system can be used to query across the system, the accessible
databases must be properly registered with the federation platform. For each database in
the federation, the ClassMap must be registered. The ClassMaps in the current
implementation contain not only the table information about the databases, but also the
network location and authentication keys. These ClassMaps are stored as local files on
the server running the federation platform and are automatically loaded when the
application is loaded. Once the platform is loaded, it is ready to accept queries from the
user.
How a Query Is Structured
A distributed query is similar to the standard SQL format [39]. The queries are
structured with "SELECT", "FROM" and "WHERE" clauses. These clauses must be
ordered correctly or else the system will not function properly. The order must start with
"SELECT", followed by "FROM" and then followed by "WHERE". This restriction in
clause order is similar to the rules in standard SQL.
In standard SQL, the column or columns specified in the "SELECT" clause are
the columns that will return their results to the user. The "FROM" clause contains the
table or the list of tables that are accessed by the query. If another clause in the query
attempts to access a table that isn't explicitly declared in the "FROM" clause, the query is
not processed. Therefore, all tables used in a query must be declared in the "FROM"
clause. The "WHERE" clause contains an optional list of conditions to restrict the
information returned back to the user. These conditions can be set as equalities or
inequalities, comparing columns against values or columns against other columns.
See [39] for more details about SQL.
In the federation platform, SQL queries are structured in practically the same way
as standard SQL. A federated query has "SELECT", "FROM", and "WHERE" clauses
that must be placed in the same order as a standard SQL query. Since all tables in the
database federation are registered when the ClassMaps are loaded, when a table is
referenced in any of the clauses, the federation platform knows if the table exists and on
which database the table resides. Therefore, the user needs only to specify table names
and columns in the clauses-the platform takes care of the rest. By hiding to the user
where the table is located, the user can observe the database federation as one large,
single database. The user query can then query against the federation as if it had all of
the end-databases combined into one database. This functionality meets the design goal
of creating transparency for the user.
Within the federation platform, the query is transformed internally into a query
called a DBPath query. This type of query contains end-database names as prefixes to
each table in the form [DatabaseName] + [TableName] or
[DatabaseName] [TableName].[ColumnName] if a column is specified. This syntax
makes explicit references to specific end-database instead of relying on ClassMaps. The
system has the capability to accept DBPath queries directly from the user if he chooses to
use this format. This feature is useful when the user wants to specify exactly where the
20
table is being retrieved. More details about the DBPath format are explained in the
Architecture section.
When a Query Is Submitted
After a query is passed into the federation platform, the text of the query is passed
into a decomposer module. The query is decomposed into end-database queries based on
rules that are coded into the platform. In several other federated database systems such as
those in [24] [26] [43], the systems decompose queries according to rules that are stored
in a separate knowledge base that is adjacent to the system. These systems have their
rules detached from the main system and read in by a module before processing any
queries. By having a rule reader as part of the main implementation, this reduces the
amount of work needed to be done when upgrading the rule set. Plus, this makes it easier
to read how the system decomposes queries for users that are not familiar with the source
code. However, to build a rule reader module, a flexible and upgradeable syntax must be
created. Building this module will also take a considerable amount of extra work beyond
building a single module with the rules hard-coded. This separation of rules from the
main system should be investigated in the future to see whether this design decision is
appropriate for the federation platform.
The logic of the current implementation is to copy table information from the enddatabases into the local database. The federation platform does this systematically by
parsing the federated query to determine each table that is accessed by the query. Once
the all of the tables are determined, the platform constructs end-database queries to
retrieve those tables. Several optimizations are put into the end-database queries to
download only certain parts of the tables. Specifically, the "WHERE" conditions are
inserted into certain end-database queries when possible to narrow down the information
returned by the end-databases. This reduction in data transfer decreases the amount of
time it takes to retrieve all of the table information from the databases for the federated
query.
Once all of the table information is retrieved, it is programmatically inserted into
temporary tables on the local database. The original federated query is transformed into a
query that is usable by the local database to query across all of the new tables. This
query is called the "aggregate query" since it is performed after all of the table
information is aggregatedon the local database. From there, the federated architecture
lets the local database handle the query processing for the federated query. The results
from the local database are then returned to the user. The results that are sent back to the
user appear as if they came from a single database that contains all of the combined
information across the entire database federation.
Once the results are sent back to the user, the local database no longer needs to
store the retrieved table information. The tables are subsequently dropped to make sure
the local database is not congested with old table data. This completes the federation
platform's execution of a federated query. The following figure demonstrates the
transactions that occur for a federated query.
21
Federated
Results
0
D
0D
Federated Query
(String)
Federation Platform
0D
I
Local Database
uery
%t:)
Queries for Tables
Ptial
Queries for Tables
oGATC
C[TABLE
TABL
Figure 2. The transactions during a federated query.
22
HGDB 1
TABLE C
V. Architecture
ClassMapperRepository
Before the federation platform can be used, the platform must have access to the
ClassMaps of each respective database in the federation. As mentioned earlier, the
ClassMaps provide a high level description of the database and the means of connectivity
to the database. As the architecture stands, the ClassMaps are read from local files on the
same machine as the application, but it has the capability to read the ClassMaps from
practically any source whether it be passed as Java String object or fetched from a remote
server. This ability comes from Java's built-in connectivity libraries.
When the ClassMaps are loaded into the ClassMapRepositoryobject, the header
that contains the connectivity information is first read. The header is written such that the
federation platform knows everything it needs to connect to the database. An example of
one of the headers is as follows:
-----
----
--
CLASSMAPPER INFO -DATABASEALIAS: GATC -CONNECTIVITY: InformixJDBC -DATABASEIP: 18.999.0.156 -PORT: 1013 -DATABASENAME: gatc -ADDITIONALPARAMETERS: INFORMIXSERVER=BENFU
AUTHENTICATION(user) : informix -AUTHENTICATION(password): F43m2#lm.y
--
Figure 3. An example connectivity header for the GATC ClassMap.
After the header information is read, the repository continues to read the file and
begins associating tables with its database name. The table names are stored with their
corresponding database names in a hash table. In a hash table, the structure holds a list of
keys with one corresponding value object per key. Hash tables are used for looking up
values based on key values. The structure does not allow there to be identical keys so
that it is guaranteed to have only one value for a single key. An identical value, however,
can be associated with multiple keys. The structure for a hash table is set up in such a
way that key-value lookups are linear in time [46]. That is, as the list of values grow, the
hash table lookup times grow incrementally (with time increments that stay the same
size). Another added benefit is that memory space also grows incrementally as more
values are added. The linear growth behavior makes hash tables ideal for scaling fast
lookups on large data sets.
Since one of the design goals was to scale to a high number of databases, the use
of a hash table was natural. In the ClassMapRepositoryobject, the designated hash table
is used for fast lookups of table-to-database mappings and conflict notification in the
event that more than one database has the same table name. As each ClassMap is loaded,
it adds each table to the hash table and associates it with a mapping to the database that
contains it. ClassMap objects are instantiated per ClassMap loaded to hold its connection
and authentication information. This continues until all tables are mapped from each of
the processed ClassMaps.
23
Once all of the tables are processed into the ClassMapRepository,the system
moves on to constructing a DistributedQueryobject. The repository is retained within
the federated platform since it is referenced several times to find the mappings between
tables during the execution of the federated query. However, the ClassMapRepository
object is not modified later-it is only accessed to pull out database-table mappings.
DistributedQuery (Data Structure)
Once the repository has been established, the platform begins to construct its main
data structure, a DistributedQueryobject. At the creation of the object, only the
federated query (in String form) is set in the object. In addition, the DistributedQuery
data structure is used throughout the architecture to store the ClassMapRepositoryobject,
the transformed federated queries, and the decomposed database queries. Detailed
descriptions of when the different members are used will be mentioned later in the
following sections. The structure of the DistributedQueryobject can be seen in the
following figure:
DistributedQuery
queryString
dbPathQueryString
aggregateQuery
classMapRespository
monoDBHashtable
SQLMonoDBQuey
SQLMonoDBQuery
databaseName
databaseName
tableName
tableName
SQLMonoDBQuery
tableQueryHashtable
table~ueryHashtable
SQLTableQuery
SQLTableQuery
SQLTableQuery
SOLTableQuery
tableName
dbName
tableName
dbName
tableName
dbName
selectVector
fromVector
selectVector
fromVector
selectVector
fromVector
SQLTableQuery
whereVector
whereVector
whereVector
SOLTableQuery
Figure 4. DistributedQuerydata structure object.
The DistributedQueryobject contains three String versions of the query: the
original federated query, the federated query mapped with its database paths (DBPath),
and the "aggregate query" or the final query that is used against the local database. The
different modules in the system that set these last two members are mentioned later in
24
this section. In addition to the transformed queries, the DistributedQueryobject contains
an object oriented structure for decomposed queries.
It was decided that for an acceptable object oriented design, the hash table data
structure would be used. As mentioned above in the ClassMapRespositorysection, using
a hash table allows the system to scale appropriately as more values are added. This
scalability is applicable to decomposed queries since in theory, as more databases are
connected to the federation, the number and complexity of the decomposed queries will
grow.
In designing how the data structures would be used in the system, it was
determined that each table query should be its own object. Since each table accessed by a
federated query has to be partially reconstructed on the local database as a new table, it
seemed logical to handle each transaction on a "per-Table" basis. Thus, the
SQL TableQuery object was used to encapsulated this abstraction. Each SQL TableQuery
object contains a list of SELECT, FROM, and WHERE arguments as well as methods to
modify the lists. The current state of the architecture only accepts these three SQL
clauses. However, since the design of the system is modulated such that each clause is
stored as a list of arguments, adding more SQL vocabulary to the SQL TableQuery only
requires adding another list of arguments. This design provides relatively quick data
structure upgrades for expanding the querying capabilities to the end-databases in the
federation. In the event of an upgrade, more logic will have to be programmed into the
query processor(s) of the architecture to accommodate for the expanded features in the
SQL vocabulary.
Moving to a higher level of abstraction is the SQLMonoDBQuery object. Each
end-database in the database federation is designated a SQLMonoDBQuery object which
contains the all of the SQL TableQuery objects used to query against that specific
database. Grouping queries by database allows for easier management, navigation, and
lookup of queries. Every SQLMonoDBQuery object stores each of its SQL TableQuery
objects as values in a hash table with the corresponding key being the table name
retrieved. By using the table name as a key to the hash table, this allows for fast lookup
of the SQL TableQuery object that is used to retrieve the file.
Moving higher up in the abstraction level brings us to the DistributedQuery
object. Similar to the SQLMonoDBQuery-to-SQLTableQuery relationship, the
DistributedQueryobject contains a hash table with database names as the keys and the
SQLMonoDBQuery objects as the values. This again allows for fast lookups of
SQLMonoDBQuery objects.
By using this design for the data structure, the federation platform can quickly
access all SQL TableQuery objects with logical groupings. As will be seen later in the
explanation of the architecture, parts of the federation platform utilize these groupings to
simplify the end-database query processing.
QueryDecomposer
After the ClassMaps have been processed, the system is ready to accept federated
queries. When the client submits a federated query, the platform begins by first
instantiating a QueryDecomposerobject and passing the query (in String form) to it. The
QueryDecomposer object then constructs a DistributedQueryobject and immediately
25
stores the federated query into the data structure. From this point on, only the
DistributedQueryis passed between the different modules.
Before the DistributedQueryobject is passed to the next module, the
QueryDecomposer makes a transformed copy of the federated query and stores this in the
DistributedQueryobject. More specifically, the decomposer object transforms the query
such that each table is prefixed with a path to its respective database (DBPath query).
The DBPath prefix is just the database followed by the "dash-greater-than" characters
that look similar to an arrow. For each table referenced in the query statement, the
QueryDecomposerperforms a lookup on the ClassMapRepositoryto determine which
database the table belongs to.
As mentioned in the previous section, transforms for each table have the form of
[DatabaseName] 4 [TableName] or [DatabaseName] + [TableName]. [ColumnName] in
the query with DBPaths. For example, the table CHIPDESIGN stored in the GATC
database would be transformed into GATC4CHIP_DESIGN in the new query. If the
query is referencing a specific column in the database, the prefix stays the same.
Therefore, the reference to the column CHIP_DESIGN.NAME in the GATC database
would turn into GATC4 CHIPDESIGN.NAME in the DBPath query.
If the transformation is successful, the DBPath query is stored in the
DistributedQueryobject. In the event that a table cannot be found in the
ClassMapRepository,the table name in the DBPath query is replaced with the string
[NOT IN CLASSMAP]. Malformed references are also flagged with a [MALFORMED]
string. This allows the user of the platform to identify syntax and spelling mistakes for
references in the query.
SQLQueryParser
After the QueryDecomposerhas finished inserting the DBPath query into the
DistributedQueryobject, it passes the data structure to an instantiated SQLQueryParser
object. The SQLQueryParserbegins the task of parsing the DBPath query. The decision
was made to have the QueryDecomposer object parse the DBPath query for two reasons:
1) to decrease the dependency on the specifications ClassMapRepositoryby classes in the
federation platform and 2) to give the user the option to manually query the federated
platform with a DBPath query instead of a normal query. The first reason was for a
"cleaner" design while the second was for access versatility for the user of the system.
Once the SQLQueryParserreceives the DistributedQuerycontaining the DBPath
query, the object runs through several steps to break apart the query into end-database
queries. The SQLQueryParserobject first breaks apart the DBPath query by its SQL
clauses. In the current implementation, the SELECT, FROM and WHERE clauses are
separated. Once the clauses are separated, table names are extracted from each clause. If
a DBPath for a table or column is malformed in a query, the class recognizes the syntax
errors and prints the errors to the system console.
The table names from the FROM clause are first extracted since in proper SQL
queries, all of the tables accessed by the query must be given in the FROM clause. All
tables accessed by the query must be specified in the FROM clause or else the system
produces an error and does not attempt to execute the query. The query form for the
federated platform is the same. The SQLQueryParserproduces an error and does not
26
process the DBPath query if it finds tables in other clauses that are not explicitly declared
in the FROM clause.
Since the queries conform to this standard format, the list of tables that follow the
FROM clause are first used to enumerate all of the tables to be accessed for the federated
query. For each table in the enumeration, the SQLQueryParserobject instantiates a new
SQL TableQuery object designated to handle the query that will return all of the required
results from the table specified. That is, each constructed SQL TableQuery object
effectively stores a query that returns results that are all from one table-a table chosen
from the enumeration. As each table in the enumeration is given a SQL TableQuery, the
parser uses the DBPaths to correctly sort where each object goes. The parser object
navigates through the DistributedQuerydata structure to correctly insert the
SQL TableQuery object into its appropriate SQLMonoDBQuery. At the time of insertion,
only the FROM clauses of the SQL TableQuery objects are set: the SELECT and WHERE
clauses are next to be initialized.
The parsing of the SELECT clause immediately follows parsing the FROM
clause. The tables are again enumerated and processed individually until all table
SELECT's are accounted for. During this process, the system checks to make sure that
all tables accessed are declared in the FROM clause. As each SELECT column is
processed, the SQLQueryParserobject extracts the DBPath and table name of each
column to search through the DistributedQuerydata structure for the appropriate
SQL TableQuery object. The parser searches for the specific SQL TableQuery object that
is designated to return all of the results from that one table. Once the SQL TableQuery
object is found, the column is inserted into the SELECT clause of the object (with the
DBPath prefix removed). This assures that no columns are missed for the aggregate
query at the end of the platform's execution of the federated query.
Following the parsing of the SELECT clause comes the WHERE clause. The
SQL vocabulary of the current implementation only allows multiple equality or
inequality conditions. Sub-queries and table aliases cannot be used as of yet. In addition,
the order of evaluation goes from left-to-right instead of the SQL's normal AND-thenOR order [39].
The logic of the parser as to how to decompose the WHERE conditions can be
simplified into three different rules. The action that the SQLQueryParsertakes depends
on the type of condition. The rules are as follows:
1) If the WHERE condition involves only one table, insert the condition into the
WHERE clause of the SQL TableQuery object that handles the results of the
column's table. Additionally, insert the specified column into the SELECT clause
of the object. The inserted condition has its DBPath's stripped before it is
inserted.
[Example:
condition
HGDB->AccessObject.submitter = "Ben Fu"
is stripped to
AccessObject.name = "Ben Fu"
then inserted into the AccessObject SQL TableQuery object
in its WHERE as
AccessObject.name = "Ben Fu"
and SELECT as
AccessObject.submitter]
2) If the WHERE condition involves two different tables both in a single database,
insert the condition into each WHERE clause of both SQL TableQuery objects that
27
handle the results of the columns' tables. Additionally, insert the specified
columns into the SELECT clauses of their respective objects. The FROM clause
of each accessed SQL TableQuery must contain both table names. The inserted
condition has its DBPath's stripped before it is inserted.
[Example:
condition
is stripped to
then inserted into the
in both WHERE's as
and both FROM
clauses have
HGDB4AccessObject.submitter = HGDB-*Contact.displayName
AccessObject.submitter = Contact.displayName
AccessObject SQL TableQuery AND Contact SQL TableQuery objects
AccessObject.submitter = Contact.displayName
insert into the
the column
AccessObject SQL TableQuery object
AccessObject.submitter
insert into the
the column
Contact SQL TableQuery object
Contact.displayName]
AccessObject, Contact
3) If the WHERE condition involves two different table on two different databases,
do not insert the condition into any SQL TableQuery objects-the aggregate query
will enforce this condition at the end. Insert the specified columns into the
SELECT clauses of their respective objects.
[Example:
condition
is stripped to
HGDB4AccessObject.name = GATC-biologicalitem.item name
AccessObject.submitter = biologicalitem.itemname
insert into the
the column
AccessObject SQL TableQuery object
AccessObject.submitter
insert into the
the column
biological item SQL TableQuery object
biological item.itemname]
Because the DistributedQueryobject groups the SQL TableQuery objects by
database in a list of SQLMonoDBQuery objects, the DBPath information is preserved in
the object relationships.
Query optimizations were investigated in [47] [48]. It was determined that many
of these optimizations were beyond the scope of immediate design goals of the current
system. However, for the sake of increasing speed and efficiency, optimizations should
be kept in mind during future developments of the federation platform.
Finally, when all of the conditions in the WHERE clause of the DBPath query are
processed, the DistributedQueryobject then contains all of the SQL TableQuery objects,
or decomposed end-database queries, needed to perform the federated query. The
federation platform then moves on to begin querying the end-databases.
DBDelegator
Once federated query is decomposed and the DistributedQueryis populated with
the individual queries, the federation platform instantiates a DBDelegatorobject. The
DBDelegatorhandles the connectivity and query execution of all databases in the
28
federation platform. In addition, the delegator object handles the connectivity, result
insertion, and query execution of the local database. It does this by interfacing with
handler objects specific to the each database. Each member in the database federation
has its connectivity information encapsulated in the header of its ClassMap. To establish
connectivity to each database, the DBDelegatorbegins by instantiating a specific handler
object for each database. Each handler is created based on the parameters specified by its
ClassMap. Similarly, a handler for the local database is instantiated except its parameters
are hardcoded into the system and not read in from ClassMaps. These handlers are
designated to be the connectivity interface between the database and the DBDelegator.
For the current implementation of the system, an InformixJDBCHandlerclass was
created. The reasons why JDBC was used for connectivity are mentioned in the previous
section. The handler class was tailored to connect to Informix databases because all of
the database systems used in the current implementation were Informix databases.
However, the only portions of the Inform ixJDBCHandlerclass that are specific to
Informix are the connectivity parameters-the rest of the class is generalizable to any
database that can handle JDBC calls. This handler object will be discussed in detail
following the explanation of the DBDelegator.
Once the handlers are initialized for the local database and the databases in the
federation (remote databases), the DBDelegatorbegins processing the end-database
queries. The delegator object starts the process by first requesting an enumeration of all
registered databases from the DistributedQueryobject. The corresponding
SQLMonoDBQuery object for the first database is looked up from the DistributedQuery.
Once the object is found, the DBDelegatorthen uses the database's handler to begin
querying the end-database. Each SQL TableQuery object in the SQLMonoDBQuery is
converted into a SQL query string and sent to the handler. After the query is executed,
the handler returns ajava.sql.ResultSetobject that encapsulates all of the data and
metadata of the returned results. For each result set returned by the handler, the delegator
passes the ResultSet to the local database's handler. The insertion is performed by
methods in the handler that create a new table and place the data into it. The name of the
new table is in the form [Database]_[TableName]. For example, results taken from the
AccessObject table in the HGDB database would cause the local handler to create a new
table called HGDBAccessObject. This convention to naming tables in this way assures
that no newly created table names conflict while making it easy to identify where the data
came from. This process is repeated for all of the remote databases.
Finally, when the process of executing all of the decomposed queries is finished,
the DBDelegatorobject transforms the DBPath query into an "aggregate query". That is,
it converts the DBPath query into a query that is used against the local database when all
of the remote data is finally aggregated. The query is equivalent to the original federated
query with the only change being that all table names are prefixed with the database
name and an underscore (as mentioned in the table naming scheme above). This is
because the tables on the local database now contain the remote information under a new
table name that complies with the naming scheme. See figure below for example
transaction by the DBDelegator.
29
DBDelegator
java. sql. ResultSet
(object)
S QLTabl eQue ry. to SQLSt ri n go
Aggregate Query
SELECT chipdesign.id, chipdesign.name
FROM chip design
WHERE chipdesign.id ='3751 73'
SELECT GATC chip design.name
FROM GATC
chipdesign
WHERE GATC chip designid ='3751 73'
java.sql.Result SetMet aDat a
(extracted from ResultSet)
final
java. sql. ResultSet
(object)
"CREATE TABLE GATC chipdesign
name
VARCHAR(32)
);'
SQL
JDBC
Handler
Informix
JDBC
Handler
SQL
JDB(
Handle r
norm I
JDB(
Handil r
GATC
database
Local
Database
:LE
Figure 5. DBDelegatorobject handling a simple query.
The results from the final aggregate query are sent back to the user and the tables
in the local database are dropped. The connections to the database are dropped and this
marks the end of the federated query execution. Figure 5 above demonstrates how the
DBDelegatorobject handles a simple query.
JDBCHandler
The JDBCHandleris an object that the federation platform uses to connect to
end-databases. The handler used in the current implementation, called the
Inform ixJDBCHandler,was made to handle Informix databases. And as mentioned
before, the main methods in the class are generalizable since JDBC is used. The only
portions specific to Informix are some members that Informix requires for connectivity.
The main function of the JDBCHandlerobject was to modulate connectivity into
a simple object that could be instantiated. With such a handler, retrieving and copying
result sets were simplified into single method calls. Plus, once the handler was
instantiated and connected to the database, the federation platform could always access
the database without having to create a new connection for each use.
To retrieve an object that contains the information returned from a query, or a
java.sql.ResultSet object, only the SQL statement string has to be passed into the
getResultSeto method of the JDBCHandlerobject. The JDBCHandlerobject handles all
of the database authentication and connectivity. To insert a result set into the local
database, only the ResultSet object from another database needs to be passed into the
30
insertResultSet() method of the handler. The JDBCHandlerthen looks through the
metadata of the ResultSet object and generates a SQL statement to create a table with
same-typed and same-named columns on the local database. From there, each row is
copied one field at a time until the entire contents of the original ResultSet object are
transferred. This is all performed transparently to the owner of the JDBCHandler-all
that needs to be known is that a copy of the ResultSet now resides on the local database.
Through the use of the JDBCHandlerobject, access to tables within the
federation platform is greatly simplified. This interface reduces the transactions to a few
simple methods that separate the owner of the handler object from the database
connectivity. This greatly reduces the clutter in code as well as the number of errors that
could result from improper JDBC operations.
All of the system components come together to form the federation platform. The figure
below is a diagram of its architecture.
FederationPlatform
Client
ClassMap
0
SQLQueryParser
QueryDecomposer
ClassMapRepository
DBDelegator
---
- -- - -r'~
S:)L
JDBC
Handler
Informix
JDBC
Handler
----.----.
-
--
-
-
SQOL
- -
-
-
-
SOL
SL
NonJDBC
JDBC
JDBC
JDBC
Handler
Handler
Handler:
Handler
Informix
JDBC
Handler
L-------
NonInformix
JDBC
H andler
(future
feature)
Informix
JDBC
Handler
--- ----
Internet
ClassMa per I
0
GATC
ass apper
HGDB
ClassMapper
Database X
Figure 6. The federation platform architecture.
31
| ClassMapper |
DMne Y
Local 0
VI. Implementation
Sun E450 running Solaris 7
System:
Informix Dynamic Server 2000
Database:
JDBC driver: Informix (Type 4) JDBC driver version 2.20
Java version: Javam 2 SDK, Standard Edition, v 1.3
Network
Interface:
10OBT Ethernet
ClassMaps
The ClassMap files that were used in this in the federation platform were the SQL
ClassMaps from William Chuang's work. As mentioned before, the ClassMaps were
extended to contain connectivity information for the databases. This connectivity
information was stored as the header of the ClassMap file. The rest of the file was the
SQL schema of the database. The federation platform parsed this information to extract
table names from the database.
VII. Discussion
Bugs
The current implementation of the system was tested for single and cross-realm
database queries. For most queries tested, the architecture was able to process the query
from start to finish. However, during the testing trials of the application, several bugs
appeared. The bugs that could not be fixed are as follows:
Malformed ClassMapfiles
When a ClassMap file was malformed, the federation platform had a difficult time
parsing it to determine the tables in a database. Small amounts of error checking were
incorporated to handle this, but during some executions, the parser could not catch some
malformed ClassMaps.
StringTokenizer Bug
The java.lang.StringTokenizerclass had a strange behavior when it was used to
parse ClassMap files. If a "CREATE TABLE tablename (...)" SQL statement ended
with a carriage return immediately after the table name and the delimiter used was the
"space" character, the table name would not be tokenized properly. If a space was
inserted before the carriage return, or no carriage return was used, then the tokenizer
would function properly. When the tokens were printed to the screen, the table names
output the same whether a carriage return followed or not. However, during an String
equality test, the two failed. This bug must be documented in Sun's Java bug report.
Large Data Sets
Since JDBC is known to perform poorly with very large data sets, fetching large
sets of data from end-databases should be avoided. In the future, a solution can be
implemented to handle very large data sets. It is reported in Javasoft's bug reports [50]
32
that the sizes of the objects exceeding several megabytes have often times produced
strange behaviors. As JDBC matures a bit more, this should be less of a problem. Future
implementations of the system should avoid the use of large JDBC objects.
Dropping Tables
After the results are returned to the user, the federation platform attempts to drop
the tables created in the local database. Currently, the database returns an error that states
that it cannot drop the tables. This is likely due to some locks being held because of
write operations executed on the tables. The DBDelegator.javaclass should be
investigated to make sure that all locks on tables are released before the system requests
for the tables to be dropped.
Future Improvements
Threading capabilities
To increase the performance of the system, threading could be implemented.
Specifically, the federation platform could spawn off processes to fetch data from enddatabases instead of fetching the data sequentially. This could require a significant
amount of work to manage the threads, but the overall system performance would
increase dramatically since the bottleneck in the system is the speed by which tables are
fetched from end-databases.
Security
The current implementation of the system is lacking a solid security model. The
authentication information for the end-databases could be readable by the outside world.
Developing a more sophisticated authentication procedure could improve the security of
the system in the future.
Query Optimization
There are many documents and papers with different strategies about query
optimizations [48] [49]. The current implementation of the federation platform has
limited amounts of optimizations. However, to fine-tune the system so that it fetches
only the requiredtable information from the end-databases, query optimizations must be
implemented into the system. This topic can get extremely complicated since there is a
large math component involved in optimizing and handling data sets. Query
optimizations should be incorporated in future implementations of the system to help
increase the overall performance of the system.
Deploying the Federation Platform
The Informix JDBC driver must be included in the classpath during the runtime of
the federation platform. To start the platform, the Java VM needs to run the
FederationPlatform.class file. See the Appendix section for the code files.
33
Bibliography
[1]
N. Dao, P.J. McCormick, C.F. Dewey, Jr. The human physiome as an information environment.
Annals of Biomedical Engineering. 2000.
[2]
P.J. McCormick. Designing Object-OrientedInterfacesfor Medical Data Repositories. M. Eng.
thesis, MIT. 1999.
[3]
J. Crimson, W. Crimson, D. Berry, G. Stephens, E. Felton, D. Kalra, P. Toussaint, and
OW. Weier. A CORBA-based Integration of Distributed Electronic Healthcare
Records Using the Synapse Approach. IEEE Transactionson Information Technology in
Biomedicine. September 1998, 124-138.
[4]
M. Hakman and T. Groth. Object-Oriented Biomedical System Modeling - The Rationale.
Computer Methods and Programsin Biomedicine Vol 59. 1999. pp 1-17.
[5]
CC Talbot Jr., AJ Cuticchia, Human Mapping Databases. Current Protocols in Human Genetics
1.13.1-1.13.12. John Wiley & Sons, Inc. http://gdbwww.gdb.org/. 1999.
[6]
Human Genome Project. Report of the InvitationalDOE Workshop on Genome Informatics.
http://www.ornl.gov/hgmis/publicat/miscpubs/bioinfo/inLrep2.html. April 1993.
[7]
National Center for Biotechnology Information. GenBank Overview.
http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html. 1999.
[8]
John Macauley, Huajun Wang, and Nathan Goodman. A model systemfor studying the integration
of molecular biology databases. Biolnformatics Journal. Vol 14, No. 71 pp 575-582.1998.
[9]
Nathan Goodman, Steve Rozen, and Lincoln Stein. The Casefor Componentry in Genome
Information Systems. Meeting on Interconnection of Molecular Biology Databases, Stanford
University, 1994.
[10] Tatiana A. Tatusova, Ilene Karsch-Mizrachi, and James A. Ostell. Complete genomes in WWW
Entrez: data representationand analysis Biolnformatics Journal. Vol 15, No. 7/8, pp 536-543.
1999.
[11] Jens Hanke, Gerrit Lehmann, Peer Bork, and Jens G. Reich. Associate databaseof
protein sequences, Biolnformatics Journal. Vol 15, No. 9, pp 741-748. 1999.
[12] E. Barillot, U. Leser, P. Lijnzaad, C. Cussat-Blanc, K. Jungfer, F. Guyon, G. Vaysseix, C. Helgesen,
and P. Rodriguez-Tom'e. A Proposalfora Standard CORBA Interfacefor Genome Maps,
Biolnformatics Journal. Vol 15, No. 2, pp 157-169. 1999.
[13] Oak Ridge National Laboratory. A DistributedConsortiumfor High-Throughput
Analysis and Annotation of Genomes. http://grail.Isd.ornl.gov/gac/. 1999.
[14] Human Genome Project. Genome Glossary.
http://www.ornl.gov/hgmis/publicat/glossary.html. July, 1999.
[15] Bruce Birren, Eric Green, Phil Hieter, Sue Klapholz, and Rick Myers, eds. Genome Analysis:
A LaboratoryManual. Laboratory Press, 1996.
[16] Stanford Human Genome Center. Human and Saccharomyces Genome Glossaries.
http://www-shgc.stanford.edu/About/faq/glossary.html,
http://genome-www.stanford.edu/Saccharomyces/help/glossary.html.
34
[17] LaTanya Sweeney. Towards the Optimal Suppression of Details when DisclosingMedical Data....
Proceedings of MEDINFO 98, International Medical Informatics Association.
Seoul, Korea. North-Holland. p. 1157, 1998.
[18] LaTanya Sweeney. Datafly: A System for ProvidingAnonymity in Medical Data,
Database Security, XI: Status and Prospects (T. Lin and S. Qian, eds.),
Elsevier Science, Amsterdam. Chapter 22, 1998.
[19] LaTanya Sweeney. Replacing Personally-IdentifyingInformation in Medical Records, the Scrub
System, Proceedings, Journal of the American Medical Informatics Association (JJ. Cinino, ed),
Washington, DC: Hanley Belfus, Inc., pp. 333-337, 1996.
[20] http://cbil.huingen.upenn.edu/epodb/
[21] W. Chuang. Design of a Genetics DatabaseforMedical Research. M. Eng. thesis, MIT. 2000.
[22] S. K. Moore. HarmonizingData, Setting Standards. IEEE Spectrum vol. 38, issue 1, pp. 111-112.
January 2001.
[23] J. Kohler, M. Lange, R. Hofestadt, S. Schulze-Kremer. Logical and Semantic DatabaseIntegration.
Bio-Informatics and Biomedical Engineering pp. 77-80, 2000. Proceedings of IEEE International
Symposium on Nov. 8-10, 2000.
[24] A. Freier, R. Hofestadt, M. Lange, and U. Scholz. MARGBench - An Approachfor Integration,
Modeling and Animation of Metabolic Networks. Proceedings of the German Conference on
Bioinformatics, Hannover, 1999.
[25] P. D. Karp. A Strategyfor DatabaseInteroperation. Journal of Computational Biology, vol. 2, pp.
573-586, 1995.
[26] B. Reinwald, H. Pirahesh, G. Krishnamoorthy, G. Lapis, B. Tran, S. Vora. Heterogeneous query
processing through SQL tablefunctions. Proceedings of the 15th International Conference on Data
Engineering, pp. 366-373, 1999.
[27] G.J.L. Kemp, N. Angelopoulos, P.M.D. Gray. A schema-based approach to building a
bioinformaticsdatabasefederation. Proceedings of the IEEE International Symposium on BioInformatics and Biomedical Engineering, 2000.
[28] R. J. Robbins. Bioinformatics: Essentialinfrastructurefor global biology. Journal of
Computational Biology, issue 3, pp. 465-478, 1996.
[29] M. Garcia-Solaco, M. Castellanos, F. Saltor. Discovering interdatabaseresemblance of classesfor
interoperabledatabases. Research Issues in Data Engineering, 1993: Interoperability in
Multidatabase Systems, 1993.
[30] A. Dogac, C. Dengi, E. Kilic, G. Ozhan, F. Ozcan, S. Nural, C. Evrendilek, U. Halici, B. Arpinar, P.
Koksal, S. Mancuhan. A multidatabasesystem implementation on CORBA. Research Issues in Data
Engineering, pp. 2-11, 1996.
[31] K. Hergula, T. Harder. A middleware approachfor combining heterogeneous data sources integration of generic query andpredefinedfunction access. Web Information Systems
Engineering, pp. 26 -33, vol.1, 2000.
35
[32] D.D. Karunaratna, W.A. Gray, N.J. Fiddian. Exploitation of database meta-data in assisting
databaseinteroperation. IEE Colloquium on Multimedia Databases and MPEG-7 (Ref. No.
1999/056), pp. 12/1 -12/5, 1999.
[33] H. Huang, J. Kerridge, S. Chen. A query mediation approach to interoperabilityof heterogeneous
databases. Proceedings from the 11th Australasian Database Conference, pp. 41-48, 2000.
[34] W. Meng and C. Yu. Query Processingin MultidatabaseSystems. Modem Database Systems: The
Object Model, Interoperability, and Beyond, pp. 551-572. Edison Wesley, 1995.
[35] A. d'Ambrogio, G. Izaeolla. A CORBA-based approach to design gateways for multidatabase
systems. Enabling Technologies: Infrastructure for Collaborative Enterprises, pp. 49-54, 1997.
[36] 0. Jautzy. Interoperabledatabases: a programminglanguage approach. Proceedings from IDEAS
'99 International Symposium on Database Engineering and Applications, pp. 63-71, 1999.
[37] S.B. Yoo, K.C. Kim, S.K. Cha. A middleware implementation of active rulesfor ODBMS.
Proceedings from the 6th International Conference on Database Systems for Advanced
Applications, pp. 347-354, 1999.
[38] A.R. Hurson, M.W. Bright, S.H. Pakzad. MultidatabaseSystems: An Advanced Solution for Global
Information Sharing. IEEE Computer Society Press, 1994.
[39] B. Forta. Sams Teach YourselfSQL in 10 Minutes. Sams Publishing, 2000.
[40] A. Elmagarmid, M. Rusinkiewicz, A. Sheth. Management of Heterogeneous and Autonomous
DatabaseSystems. Morgan Kaufmann Publishers Inc., 1999.
[41] http://sdmc.krdl.org.sg:8080/kleisli/
[42] http://www.genetic-exchange.con/
[43] http://srs6.ebi.ac.uk/
[44] http://edradour.cs.uni-ma gdeburg.de/iti brm/marg/
[45] http://www.affymetrix.com/
[46] C.S. Horstmann, G. Cornell. Core Java 2, Volume
Microsystems Press, 2000.
1: Fundamentals. Prentice Hall PTR/Sun
[47] J. Grant, J. Gryz, J. Minker, L. Raschid. Logic-basedquery optimizationfor object databases.
IEEE Transactions on Knowledge and Data Engineering, vol. 12, issue 4, pp. 529-547, 2000.
[48] J. Claussen, A. Kemper, G. Moerkotte, K. Peithner, M. Steinbrunn. Optimization and evaluation of
disjunctive queries. IEEE Transactions on Knowledge and Data Engineering, vol. 12, issue 2, pp.
238-260, 2000.
[50] http://www.javasoft.com/
36
APPENDIX
FederationPlatform.j ava
import java.sql.*;
import java.util.*;
public class FederationPlatform
private Vector registeredDatabaseAdapters;
private QueryDecomposer queryDecomposer;
public FederationPlatform()
registeredDatabaseAdapters = new Vector();
queryDecomposer = new QueryDecomposer()
public static void main(String args[l)
FederationPlatform federationPlatform = new FederationPlatform()
federationPlatform.initialize(args);
public void initialize(String args[])
ClassMapRepository classMapRepository
classMapRepository.processClassMapFile
classMapRepository.processClassMapFile
queryDecomposer. set ClassMapRepository
= new ClassMapRepository()
(args [0])
(args [1]);
(classMapRepository)
acceptQuery("PUT QUERY STRING HERE");
DBDelegator dbDelegator = new DBDelegator (queryDecomposer.getDistributedQuery());
dbDelegator.dropLocalTables();
public void acceptQuery(String
queryString)
queryDecomposer. processQuery (queryString);
37
ClassMapRepository.java
import java.util.*;
import java.io.*;
public class ClassMapRepository
Hashtable federatedClassMapHashtable;
Hashtable classMapHashtable;
public ClassMapRepository()
federatedClassMapHashtable = new Hashtable()
classMapHashtable = new Hashtable();
public Object putClassMap(String
dbName,
ClassMap classMap)
//returns
the previous value of the specified key in this
did not have one.
return classMapHashtable.put(dbName, classMap);
public
Object getClassMap(String
hashtable,
it
dbName)
//returns the value to which the key is mapped in this hashtable;
not mapped to any value in this hashtable
return classMapHashtable.get(dbName);
public
or null if
Object removeClassMap(String
null if the key is
dbName)
//returns
the value to which the key had been mapped in this
the key did not have a mapping.
return classMapHashtable. remove (dbName)
hashtable,
or null if
public Enumeration getDBEnumeration)
return classMapHashtable.keys(;
public Object putTableDB(String
TableAlreadyExistsException!!!
tableName,
//returns
the previous value of the
did not have one.
String dbName)
specified key in this
Object object = federatedClassMapHashtable.put(tableName,
if (object != null)
//throws
hashtable,
or null
if
it
dbName);
//throw TableAlreadyExistsException!!!
return object;
public Object getDBPath(String
tableName)
//throws TableDoesNotExistExeception!!!
//returns the value to which the key is mapped in this hashtable; null if the key is
not mapped to any value in this hashtable
return federatedClassMapHashtable.get(tableName);
public
Object removeTableDB(String
tableName)
//throws TableDoesNotExistExeception!!
//returns
the value to which the key had been mapped in this
the key did not have a mapping.
return federatedClassMapHashtable. remove (tableName)
38
hashtable,
or null
if
public Enumeration getTablesEnumeration()
return federatedClassMapHashtable. keys();
private boolean parseClassMap (String classMapString)
StringTokenizer stringLineTokenizer
//tokenize go by lines
try
= new StringTokenizer(classMapString,
ClassMap classMap = new ClassMap(;
while (stringLineTokenizer.hasMoreTokens()
&&
"\n");
(classMap.isAllParametersSet()
true))
String currentLine = stringLineTokenizer.nextToken();
StringTokenizer stringWordTokenizer = new StringTokenizer(currentLine,
//tokenize by spaces
String currentWord;
while (stringWordTokenizer.hasMoreElements()
"
currentWord = stringWordTokenizer.nextToken();
if
("DATABASEALIAS:".equalsIgnoreCase(currentWord))
currentWord = stringWordTokenizer.nextToken();
classMap.setDatabaseAlias(currentWord);
else if
("CONNECTIVITY:"
equalsIgnoreCase (currentWord))
currentWord = stringWordTokenizer.nextToken();
classMap.setConnectivity(currentWord);
else if
("DATABASEIP: " equalsIgnoreCase (currentWord))
currentWord = stringWordTokenizer.nextToken();
classMap.setIP(currentWord);
else if
("PORT: "equalsIgnoreCase
(currentWord))
currentWord = stringWordTokenizer.nextToken();
classMap.setPort(currentWord);
else if
("DATABASENAME:".equalsIgnoreCase(currentWord))
currentWord = stringWordTokenizer.nextToken(;
classMap.setDatabaseName(currentWord);
else if
("ADDITIONAL_PARAMETERS:".
equalsIgnoreCase (currentWord))
currentWord = stringWordTokenizer.nextToken(;
classMap.setAdditionalParameters(currentWord);
else if
("AUTHENTICATION(user):".equalsIgnoreCase(currentWord))
currentWord = stringWordTokenizer.nextToken();
classMap.setUser(currentWord);
else if
("AUTHENTICATION(password):"
.equalsIgnoreCase (currentWord))
currentWord = stringWordTokenizer.nextToken();
classMap.setPassword(currentWord);
putClassMap(classMap.getDatabaseAlias(,
classMap);
while (stringLineTokenizer.hasMoreTokens()
String currentLine = stringLineTokenizer.nextToken();
StringTokenizer stringWordTokenizer = new StringTokenizer(currentLine,
//tokenize by spaces
String currentWord;
39
"
")
!=
while
(stringWordTokenizer.hasMoreElements()
currentWord = stringWordTokenizer.nextToken();
if
("CREATE". equalsIgnoreCase (currentWord))
currentWord = stringWordTokenizer.nextToken();
if
("TABLE". equalsIgnoreCase (currentWord))
currentWord = stringWordTokenizer.nextToken();
//this
should be the
tableName
table name
currentWord.trim();
//there's a strange bug...
make sure there's at least a space after the
in the ClassMap file
//or else the tableString won't be matched up in the hashtable during
lookups
putTableDB (currentWord,
classMap.getDatabaseAlias());
//<--DatabaseALIAS
not DatabaseNAME
catch
(NoSuchElementException e)
System.out.println("ERROR:
return false;
NoSuchElementException:"+
e.getMessage();
return true;
public boolean processClassMap(String
classMapString)
return parseClassMap(classMapString)
public boolean processClassMapFile (String fileName)
try
StringBuffer sb = new StringBuffer();
FileReader fileReader = new FileReader(fileName)
int currentread = fileReader.read();
while (currentread != -1)
sb.append((char)currentread);
currentread = fileReader.read(;
return parseClassMap(sb.toString();
catch
(FileNotFoundException e)
System.out.println("ERROR:
catch
File Not Found Exception:"
+ e.getMessage());
(IOException e)
System.out.println("ERROR:
return false;
IOException:"
//did not complete
40
+ e.getMessageo)
ClassMap.java
public class ClassMap
private
private
private
private
private
private
private
private
String
String
String
String
String
String
String
String
databaseAlias;
databaseName;
connectivity;
IP;
port;
additionalParameters;
user;
password;
public ClassMap()
public void setDatabaseAlias(String databaseAlias)
this.databaseAlias = databaseAlias;
public String getDatabaseAlias()
return databaseAlias;
public void setDatabaseName(String databaseName)
this.databaseName = databaseName;
public String getDatabaseName()
return databaseName;
public void
setConnectivity(String connectivity)
this.connectivity = connectivity;
public String getConnectivity()
return connectivity;
public void setIP(String IP)
this.IP =
IP;
public String getIP()
return IP;
public void setPort(String port)
this.port = port;
public String getPort()
return port;
public void setAdditionalParameters(String additionalParameters)
this.additionalParameters = additionalParameters;
41
public String getAdditionalParameters()
return additionalParameters;
public void setUser(String user)
this.user = user;
public String getuser)
return user;
public void setPassword(String
password)
this.password = password;
public String getPassword()
return password;
public boolean isAllParametersSet()
if
((databaseAlias !=
&& (IP
null)
null) &&
&& (port
(databaseName
null) && (connectivity !=
null) && (user != null) && (password
return true;
else
return false;
public String toString()
StringBuffer sb = new StringBuffer);
sb.append("databaseAlias=")sb.append(databaseAlias);
sb.append(" databaseName=");
sb.append(databaseName);
sb.append(" connectivity=");
sb.append(connectivity);
sb.append(" IP=");
sb.append(IP);
sb.append(" port=");
sb.append(port);
sb.append(" additionalParameters=");
sb.append(additionalParameters);
sb.append(" user=");
sb.append(user);
sb.append(" password=");
sb.append(password);
return sb.toStringo;
42
null)
null))
DistributedQuery.java
import java.util.*;
public class DistributedQuery
private
private
private
private
private
private
private
private
String query;
String dbPathQuery;
String aggregateQuery;
Hashtable monoDBQueryHashtable;
Vector aggregateSelectVector;
Vector aggregateFromVector;
Vector aggregateWhereVector;
ClassMapRepository classMapRepository;
public DistributedQuery()
query = "
monoDBQueryHashtable = new Hashtableo;
aggregateSelectVector = new Vector();
aggregateFromVector = new Vector();
aggregateWherevector = new Vector();
public DistributedQuery (String queryString)
query = queryString;
monoDBQueryHashtable = new Hashtableo;
aggregateSelectVector = new Vector();
aggregateFromVector = new Vector();
aggregateWhereVector = new Vector();
public void setClassMapRepository(ClassMapRepository
classMapRepository)
this.classMapRepository = classMapRepository;
public ClassMapRepository
getClassMapRepository()
return classMapRepository;
public void setQueryString(String
queryString)
query = queryString;
public String getQueryString()
return query;
public void setDBPathQueryString (String dbPathQuery)
this.dbPathQuery = dbPathQuery;
public String getDBPathQueryString()
return dbPathQuery;
public void setAggregateQueryString (String aggregateQuery)
this.aggregateQuery = aggregateQuery;
public String getAggregateQueryString()
return aggregateQuery;
43
public Object putMonoDBQuery (String dbName,
SQLMonoDBQuery dbQuery)
//returns the previous value of the specified key in this hashtable, or null if it
did not have one.
return monoDBQueryHashtable.put(dbName, dbQuery);
public Object
removeMonoDBQuery (String dbName)
//returns
the value to which the key had been mapped in this
the key did not have a mapping
return monoDBQueryHashtable. remove (dbName)
public
Object getMonoDBQuery(String
dbName)
return monoDBQueryHashtable. get (dbName);
public Enumeration getMonoDBKeys()
return monoDBQueryHashtable.keys(;
public void addSelect (String selectString)
aggregateSelectVector.addElement (selectString)
public void removeSelect (String selectString)
//String
must be exact same reference to be removed
aggregateSelectVector. removeElement (selectString)
public Enumeration getSelectEnumeration()
return aggregateSelectVector.elements()
public void addFrom(String fromString)
aggregateFromVector. addElement (fromString)
public void removeFrom(String
fromString)
//String
must be exact same reference to be removed
aggregateFromVector. removeElement (fromString)
public Enumeration getFromEnumeration()
return aggregateFromVector.elements();
public void addWhere(String
whereString)
aggregateWhereVector. addElement (whereString);
public void removeWhere (String whereString)
//String must be exact same reference to be removed
aggregateWhereVector. removeElement (whereString)
public Enumeration getWhereEnumeration()
44
hashtable,
or null
if
return aggregateWhereVector. elements();
public String toString()
StringBuffer sb = new StringBuffer();
sb.append(" -=DistributedQuery=- query="+query+"\n");
sb append("dbPathQuery="+dbPathQuery+"\n");
sb. append(" aggregateQuery="+aggregateQuery+"\n");
sb.append("---------------------------------\n");
Enumeration dbKeys = getMonoDBKeys(;
while (dbKeys.hasMoreElements())
String monoDBQueryString = (String) (dbKeys.nextElement());
SQLMonoDBQuery monoDBQuery = (SQLMonoDBQuery) getMonoDBQuery (monoDBQueryString);
sb.append(monoDBQuery.toString());
return sb.toString();
45
SQLMonoDBQuery.java
import java.util.*;
public class SQLMonoDBQuery
//This data object holds all queries for one database
private String databaseNameString;
private Hashtable tableQueryHashtable;
public SQLMonoDBQuery()
//tableQueryVector = new Vector();
tableQueryHashtable = new Hashtable(;
public void setDatabaseName (String databaseNameString)
this.databaseNameString = databaseNameString;
public String getDatabaseName()
return databaseNameString;
public Object
putTableQuery(String tableName,
SQLTableQuery
//returns
the previous value of the specified key in this
did not have one.
return tableQueryHashtable.put(tableName,
tableQuery);
public Object
removeTableQuery(String
getTableQuery(String
hashtable,
or null if
it
tableName)
//returns
the value to which the key had been mapped
the key did not have a mapping
return tableQueryHashtable. remove (tableName)
public Object
tableQuery)
in this
hashtable,
or null if
tableName)
return tableQueryHashtable.get(tableName);
public Enumeration getTableKeys()
return tableQueryHashtable.keys(;
public String toString()
StringBuffer sb = new StringBuffer();
Enumeration tableKeys = getTableKeyso;
sb.append("-=SQLMonoDBQuery=- databaseName="+databaseNameString+"\n");
sb.append("---------------------------------------------------\n")
while (tableKeys.hasMoreElements()
String tableKeyString = (String) tableKeys.nextElement(;
SQLTableQuery tableQuery = (SQLTableQuery) getTableQuery (tableKeyString);
sb.append(tableQuery.toString() + "\n\n")
return sb.toString();
46
SQLTableQuery.java
import java.util.*;
public class SQLTableQuery
//This class is a data object that contains one query to a database
private String tableNameString;
private String DBNameString;
private Vector selectVector;
private Vector fromVector;
private Vector whereVector;
public SQLTableQuery()
selectVector = new Vector(;
fromVector = new Vector();
whereVector = new Vector();
public void setTableName (String tableNameString)
this.tableNameString = tableNameString;
public String getTableName()
return tableNameString;
public void setDBName(String dbName)
this.DBNameString = dbName;
public String getDBName()
return DBNameString;
public int findElementIndex(Vector
vector, String findString)
//returns the index of the element in the Vector that is equivalent to the findString
//CASE SENSITIVE!!!
//if
no index is found, returns -1
//if
finds the same case, returns -2
boolean foundwrongcase = false;
for (int i = 0; i < vector.size(); i++)
if ( ((String) (vector.elementAt(i))
.equals(findString)
return i;
else if
( ((String) (vector.elementAt(i))
foundwrong case
if
=
true;
.equalsIgnoreCase(findString)
//exists, but in wrong case
(foundwrong case == true)
return -2;
//exists, but in wrong case
else
return -1;
//not in vector
public int findElementIndexIgnoreCase(Vector
47
vector,
String findString)
//returns the index of the element in the Vector that is equivalent to the findString
//if no index is found, returns -l
for (int i = 0; i < vector.size(); i++)
if
( ((String) (vector.elementAt(i))) .equalsIgnoreCase(findString)
return i;
return -1;
//not in vector
public boolean addSelect (String selectString)
int elementindex = findElementIndexIgnoreCase(selectVector, selectString);
if (elementindex
-1)
//if the string isn't in the vector
selectVector.addElement (selectString)
return true;
return false;
public boolean removeSelect (String selectString)
int elementindex
if (elementindex
=
!=
findElementIndexIgnoreCase(selectVector, selectString);
-1)
selectVector.remove(elementindex);
return true;
return false;
public Enumeration getSelectEnumeration()
return selectVector.elements();
public boolean addFrom(String fromString)
int elementindex = findElementIndexIgnoreCase(fromVector, fromString);
if (elementindex == -1)
//if the string isn't in the vector
fromVector.addElement(fromString);
return true;
return false;
public boolean removeFrom(String fromString)
int element index = findElementIndexIgnoreCase(fromVector, fromString);
if (elementindex != -1)
fromVector.remove(elementindex);
return true;
return false;
public Enumeration getFromEnumeration()
return fromVector.elements();
public boolean addWhere(String whereString)
int elementindex = findElementIndex(whereVector, whereString);
if (element-index == -1)
//if the string isn't in the vector
48
whereVector.addElement(whereString);
return true;
return false;
public boolean removeWhere(String whereString)
int element index = findElementIndex(whereVector, whereString);
if ( (elementindex != -1) && (element-index
-2)
whereVector.remove(elementindex);
return true;
return false;
public Enumeration getWhereEnumeration()
return whereVector.elements);
public String toSQLString()
StringBuffer sb = new StringBuffer();
sb.append("SELECT ");
for (int i=0; i < selectVector.size();
sb.append(
if
(i <
i++)
(String) selectVector.elementAt (i)
(selectVector.size() - 1))
sb.append(",
");
else
sb.append("\n");
sb.append("FROM ");
for (int i=0; i < fromVector.size() ; i++)
sb.append( (String) fromVector.elementAt(i)
if (i < (fromVector.size() - 1))
sb.append(",
"');
else
sb.append("\n");
if (whereVector.size()
0)
sb.append("WHERE ");
for (int i=0; i < whereVector.size();
sb.append(
if
(i <
i++)
(String)whereVector.elementAt (i)
(whereVector.size() - 1))
sb.append(",
"');
else
sb.append("\n");
return sb.toString();
public String toString()
49
StringBuffer sb = new StringBuffer();
sb.append("-=SQLTableQuery=- \n");
sb.append(" DBName="+DBNameString+"\n");
sb. append(" TableName=" +tableNameString+" \n" )
sb.append ("SELECT ");
for (int
i=O; i < selectVector.size(); i++)
sb.append( (String) selectVector.elementAt (i)
if (i < (selectVector.size() - 1))
sb.append(",
");
else
sb.append(I"\n");
sb.append("FROM ");
for (int
i=0; i < fromVector.size();
i++)
sb.append( (String)fromVector.elementAt(i)
if (i < (fromVector.size() - 1))
sb.append(",
");
else
sb.append("\n");
if
(whereVector.size()
!=
0)
sb.append("WHERE ");
i=0; i < whereVector.size()
for (int
i++)
sb.append( (String)whereVector.elementAt (i)
if (i < (whereVector.size() - 1))
sb.append(",
");
else
sb.append("\n");
return sb.toStringo);
50
QueryDecomposer.java
import java.util.*;
public class QueryDecomposer
private
private
private
private
Vector monoQueryVector;
SQLQueryParser queryParser;
DistributedQuery distributedQuery;
ClassMapRepository classMapRepository;
public QueryDecomposer()
monoQueryvector = new Vector();
queryParser = new SQLQueryParser();
public Vector getMonoDatabaseQueryVector()
return monoQueryVector;
//return a clone of this?
public DistributedQuery getDistributedQuery()
return distributedQuery;
public ClassMapRepository getClassMapRepository()
return classMapRepository;
public void
setClassMapRepository(ClassMapRepository classMapRepository)
this.classMapRepository = classMapRepository;
if (distributedQuery != null)
distributedQuery. setClassMapRepository(classMapRepository)
public boolean processQuery (String distributedQueryString)
distributedQuery = new DistributedQuery (distributedQueryString)
distributedQuery. setClassMapRepository (classMapRepository);
distributedQuery. setDBPathQueryString (addDBPathsToQuery(distributedQueryString));
queryParser.parse(distributedQuery);
return true;
//needs to signal processing went thru
private String addDBPathsToQuery (String queryString)
String queryUpperCaseString = queryString.toUpperCase();
String selectString, fromString, whereString;
String dbPathSelectString, dbPathFromString, dbPathWhereString;
StringTokenizer stringTokenizer;
StringBuffer sb;
if
(queryUpperCaseString.equals( ""))
//***throw an exception!
System.out.print("Empty query string!");
return
int
int
//+7;
selectindex = queryUpperCaseString.indexof("SELECT");
//+5;
from index = queryUpperCaseString.indexOf("FROM");
51
int
whereindex
if
(selectindex ==
//no SELECT keyword
-1)
System.out.println("Malformed
return
else if
//+6;
= queryUpperCaseString.index0f("WHERE")
(selectindex !=
Query! No SELECT keyword
found.");
//malformed query
0)
System.out.println("Malformed
return
Query!
SELECT keyword must be the first
word.");
else
= queryString.substring(selectindex
selectString
if
System.out.println("Malformed
return
else if
(whereindex !=
fromString
Query!
No FROM keyword found.");
//there is a WHERE keyword
-1)
= queryString.substring(fromindex
+ 5,
where-index)
+ 5,
queryString.length());
//no WHERE keyword
else
fromString
if
fromindex)
//no FROM keyword
-1)
(from-index ==
+ 7,
= queryString.substring(from_index
(whereindex ==
-1)
whereString =
else if
((whereindex
+ 6)
==
(queryUpperCaseString.length()
-
1))
whereString =
else
whereString =
if
queryString.substring(where
queryString.length();
index + 6,
//tokenize strings
//SELECT
//for selectString, test if token is a key in hashtable, and add DBPath to all tokens
in hashtable (identify
by [TABLE] . [COLUMN])
stringTokenizer = new StringTokenizer(selectString,
",
String currentSelect;
sb = new StringBuffer(;
while (stringTokenizer.hasMoreTokens()
currentSelect = stringTokenizer.nextToken()
currentSelect = currentSelect.trim();
if
(getTablePath(currentSelect) .equalsIgnoreCase (""))
specified
//if TablePath not
//throw and EXCEPTION! -- malformed SELECT clause
System. out.println("ERROR! malformed SELECT clause")
sb.append(" [MALFORMED)");
else
String tableName = getTablePath(currentSelect)
String columnName = cropTablePath(currentSelect)
String dbName = (String)classMapRepository.getDBPath(tableName)
if (dbName == null)
//table has no corresponding DB!
52
//throw an EXCEPTION!
System.out.println("ERROR! table has no corresponding DB!:
tableName="+tableName);
sb.append(" [NOT IN CLASSMAPS)");
else
sb.append(dbName
if
+ ">"I
+ tableName +
+ columnName);
(stringTokenizer.hasMoreTokens()
sb.append(",
"1);
dbPathSelectString = sb.toString();
//FROM
//for fromString, test if token is a keyin hashtable, add DBPath to all tokens if in
hashtable (identify by [TABLE] only)
stringTokenizer
= new StringTokenizer(fromString,
",");
String currentFrom;
sb = new StringBuffer();
while (stringTokenizer.hasMoreTokens())
currentFrom =
currentFrom =
String dbName
if (dbName ==
stringTokenizer.nextToken()
currentFrom.trim();
= (String) classMapRepository.getDBPath(currentFrom)
null)
//table has no corresponding DB!
//throw an EXCEPTION!
System.out.println("ERROR!
tableName="+currentFrom);
sb.append(II[NOT
table has no corresponding DB!:
IN CLASSMAPS]");
else
sb.append(dbName
+ "->" + currentFrom)
if (stringTokenizer.hasMoreTokens()
sb.append(",
");
dbPathFromString = sb.toString();
//WHERE
//for whereString, tokenize and find [TABLE] [COLUMN] ....
hashtable, then add DBPath
stringTokenizer = new StringTokenizer(whereString,
String currentWhere;
sb = new StringBuffer(;
while (stringTokenizer.hasMoreTokens()
currentWhere = stringTokenizer.nextToken()
currentWhere = currentWhere.trim();
if (getTablePath(currentWhere) equalsIgnoreCase("")
a TablePath
==
test if [TABLE]
false)
String tableName = getTablePath(currentWhere);
String columnName = cropTablePath(currentWhere);
String dbName = (String)classMapRepository.getDBPath(tableName)
if (dbName == null)
//table has no corresponding DB!
//throw an EXCEPTION!
System.out.println("ERROR! table has no corresponding DB!:
tableName=" +tableName);
sb.append(" [NOT IN CLASSMAPS]");
else
53
//if
is a key in
qualifies as
sb.append(dbName
+ "> " + tableName +
+ columnName)
else
sb.append(currentWhere);
if (stringTokenizer.hasMoreTokens()
sb.append ("
//to
");
space out words
dbPathWhereString = sb.toString();
sb = new StringBuffer();
sb.append('SELECT ");
sb.append(dbPathSelectString);
sb.append("
\n FROM ");
sb.append(dbPathFromString);
sb.append(" \n WHERE ");
sb.append(dbPathWhereString);
return sb.toString();//FINALSTRING!!!
private String getTablePath(String pathString)
int path index
if
=
(path-index ==
return "";
//throws BadTableException?
pathString.index0f(".");
-1)
//throw BadTableException
else
String DBPathString = pathString.substring(O, path-index);
return DBPathString;
private String cropTablePath(String pathString)
int path index
if
=
(path-index ==
return "";
//throws BadTableException?
pathString.indexOf(".");
-1)
//throw BadTableException
else
String croppedDBString = pathString.substring(path
pathString.length();
return croppedDBString;
54
index + 1,
SQLQueryParser.java
import java.util.*;
public class SQLQueryParser
private
private
private
private
private
private
String selectString;
String fromString;
String whereString;
String queryString;
String queryUpperCaseString;
DistributedQuery distributedQuery;
public SQLQueryParser()
queryString =
"";
public SQLQueryParser (String queryString)
parse(queryString);
public SQLQueryParser(DistributedQuery distributedQuery)
this.distributedQuery = distributedQuery;
parse(distributedQuery);
public void parse (DistributedQuery distributedQuery)
this.distributedQuery = distributedQuery;
parse (distributedQuery. getDBPathQueryString()
public void parse(String queryString)
this.queryString
if
=
queryString;
(queryString == null)
System.out.println("Null
return;
query string!");
this.queryUpperCaseString = queryString.toUpperCase();
extractClauses();
calculateRequiredTables();
public void extractClauses()
//need to make private
if (queryUpperCaseString.equals( ""))
//***throw an exception!
System.out.print("Empty query string!");
return;
int selectindex = queryUpperCaseString.indexof("SELECT"); //+7;
int from index = queryUpperCaseString.index0f("FROM");
//+5;
int whereindex = queryUpperCaseString.indexof("WHERE")
//+6;
if
(select-index ==
-1)
//no SELECT keyword
System.out.println("Malformed
return;
else if
(selectindex !=
0)
Query! No SELECT keyword found.");
//malformed query
System.out.println("Malformed
return;
Query! SELECT keyword must be the first
55
word.");
else
selectString
queryString.substring(select
=
if (fromindex
==
-1)
fromindex)
//no FROM keyword
System.out.println("Malformed
return;
else if
index + 7,
(where-index != -1)
Query! No FROM keyword found.");
//there is a WHERE keyword
fromString = queryString.substring(fromindex + 5, whereindex);
else
//no WHERE keyword
fromString = queryString.substring(from-index
if
+ 5,
queryString.length();
(where-index == -1)
whereString
else if
=
((whereindex
whereString
+ 6)
==
(queryUpperCaseString.length()
-
1))
=
else
whereString
=
queryString.substring(where-index
queryString.length());
+ 6,
private String clausesToString()
StringBuffer sb = new StringBuffer(;
sb.append("SELECT="+selectString+"\n)
sb.append("FROM="+fromString+"\n");
sb.append("WHERE="+whereString+"\n");
return sb.toString();
public void calculateRequiredTables()
String currentSelect;
String currentFrom;
String currentWhere;
StringTokenizer stringTokenizer;
SQLTableQuery tempTableQuery;
SQLMonoDBQuery tempMonoDBQuery;
//Vector columnVector = new Vector();
Vector fromVector = new Vector();
Vector whereVector = new Vector();
//Tables for FROM--the FROM clause helps enumerate the tables that need to be
accessed
//-be sure to add aliases later
stringTokenizer
while
= new StringTokenizer(fromString,
(stringTokenizer.hasMoreTokens())
",");
//most basic case of from
currentFrom = stringTokenizer.nextToken()
currentFrom = currentFrom.trim();
if (getDBPath(currentFrom) .equalsIgnoreCase (""))
//SHOULD NEVER GET HERE.. .throw exception!
else
56
//IMPLEMENT FOR MULTIPLE DBPath's
String dbName = getDBPath(currentFrom)
String tableName = cropDBPath(currentFrom)
tempTableQuery = new SQLTableQuery(;
tempTableQuery.setDBName(dbName);
tempTableQuery. setTableName (tableName)
tempTableQuery. addFrom (tableName);
fromVector.add(tableName);
if (distributedQuery.getMonoDBQuery(dbName)
create new SQLMonoDBQuery object
== null)
//if doesn't exist,
tempMonoDBQuery = new SQLMonoDBQuery(;
tempMonoDBQuery. setDatabaseName (dbName);
distributedQuery.putMonoDBQuery(dbName,
tempMonoDBQuery);
else
tempMonoDBQuery = (SQLMonoDBQuery)distributedQuery.getMonoDBQuery(dbName);
tempMonoDBQuery. putTableQuery (tableName,
tempTableQuery);
//Tables for SELECT
stringTokenizer = new StringTokenizer(selectString,
",");
while (stringTokenizer.hasMoreTokens())
//most basic case of selecting
currentSelect = stringTokenizer.nextToken()
currentSelect = currentSelect.trim();
if (getDBPath(currentSelect) .equalsIgnoreCase (""))
//if
DBPath not specified
//throw and EXCEPTION! -- DBPath's are required
else
//IMPLEMENT FOR DBPath if given a specific DBPath for the columns
String dbName = getDBPath(currentSelect);
String tableColumnName = cropDBPath(currentSelect)
String tableName = getTablePath(tableColumnName);
String columnName = cropTablePath(tableColumnName)
insertSelectIntoTableQuery(dbName,
tableName,
columnName);
}
//Tables for WHERE
if (whereString != "")
stringTokenizer = new StringTokenizer(whereString,
"
StringBuffer sb = new StringBuffer(;
String upperCaseWhereString = whereString.toUpperCase()
//to
find ANDs and ORs
easier
while
(stringTokenizer.hasMoreTokens()
currentWhere = stringTokenizer.nextToken();
currentWhere = currentWhere.trim();
//NEED TO CARVE OUT MORE LOGIC HERE
if (currentWhere. equalsIgnoreCase ("AND")
whereVector.add(sb.toString());
sb = new StringBuffer();
sb.append(currentWhere + " ")
if (sb.toString()
if
!=)"
(sb.charAt(sb.length()
-
1)
==
'
57
I
currentWhere. equalsIgnoreCase ("OR"))
sb.deleteCharAt(sb.length() - 1);
whereVector.add(sb.toString());
else
whereVector.add(sb.toString();
String currentCondition;
String currentToken =
Vector strippedWhereVector = new Vector();
for (int
i = 0; i < whereVector.size(); i++)
currentCondition = (String)whereVector.elementAt(i);
stringTokenizer = new StringTokenizer(currentCondition,
StringBuffer noDBPathWhere = new StringBuffer();
int
firstindex
= currentCondition.index0f("->");
int
last-index
= currentCondition.lastIndexOf("->");
String firstDB =
String firstTable =
String firstColumn =
String secondDB =
String secondTable =
String secondColumn =
while
"
(stringTokenizer.hasMoreTokens()
currentToken = stringTokenizer.nextToken()
if
(firstDB.equals("")
&& (currentToken.indexOf("->")
-1))
//no DB was
found yet
firstDB = getDBPath(currentToken);
String firstTableColumn = cropDBPath(currentToken);
firstTable
= getTablePath(firstTableColumn);
firstColumn = cropTablePath(firstTableColumn);
noDBPathWhere.append(firstTableColumn
else if
((firstDB.equals("")
//first DB already found
==
false)
+ "
&& (currentToken.indexof("->")
-1))
secondDB = getDBPath(currentToken);
String secondTableColumn = cropDBPath(currentToken);
secondTable = getTablePath(secondTableColumn);
secondColumn = cropTablePath(secondTableColumn);
noDBPathWhere.append(secondTableColumn
+ "
else
noDBPathWhere.append(currentToken
+ "
strippedWhereVector.add(noDBPathWhere.toString());
if
(firstDB.equals (""))
//no
System. out.println("ERROR!
else if
(secondDB
==
"")
DBPaths read!
ERROR!
MALFORMED WHERE");
//only
one DB is
used
//SELECT: insert firstColumn into firstDB->firstTable SELECT
insertSelectIntoTableQuery(firstDB,
firstTable, firstColumn);
//FROM: no changes
//WHERE:
insert
noDBPathWhere into firstDB->firstTable WHERE
insertWhereIntoTableQuery(firstDB,
firstTable, noDBPathWhere.toString());
else if
(firstDB.equalsIgnoreCase(secondDB))
twice in the where clause
58
//one DB is referenced but used
if
(firstTable.equalsIgnoreCase(secondTable))
//a join is performed with the
same table (self-join)
//SELECT: insert firstColumn into firstDB->firstTable, insert secondColumn
into secondDB->secondTable
insertSelectIntoTableQuery(firstDB, firstTable, firstColumn);
insertSelectIntoTableQuery(secondDB, secondTable, secondColumn);
//FROM: no changes
//WHERE: insert noDBPathWhere into firstDB->firstTable (DO NOT insert into
secondDB->secondTable)
insertWhereIntoTableQuery(firstDB, firstTable, noDBPathWhere.toString());
else
(natural join)
//a join is performed on two different tables on the same DB
//SELECT: insert firstColumn into firstDB->firstTable->SELECT, insert
secondColumn into secondDB->secondTable SELECT
insertSelectIntoTableQuery(firstDB, firstTable, firstColumn);
insertSelectIntoTableQuery(secondDB, secondTable, secondColumn);
//FROM: insert firstTable into secondDB->secondTable FROM, insert secondTable
into firstDB->firstTable FROM
insertFromIntoTableQuery(secondDB, secondTable, firstTable);
insertFromIntoTableQuery(firstDB, firstTable, secondTable);
//WHERE: insert noDBPathWhere into firstDB->firstTable WHERE, insert
noDBPathWhere into secondDB->secondTable WHERE
insertWhereIntoTableQuery(firstDB, firstTable, noDBPathWhere.toString());
insertWhereIntoTableQuery(secondDB, secondTable, noDBPathWhere.toString());
else if
(firstDB.equalsIgnoreCase (secondDB) == false)
//SELECT: insert firstColumn into firstDB->firstTable SELECT, secondColumn into
secondDB->secondTable SELECT
insertSelectIntoTableQuery (firstDB, firstTable, firstColumn);
insertSelectIntoTableQuery(secondDB, secondTable, secondColumn);
//FROM: no changes
//WHERE:
no changes
--
DO NOT send to either
private void insertWhereIntoTableQuery(String
whereString)
//throws Exceptions!
DB
dbName,
(let
aggregated query handle)
String tableName,
String
SQLMonoDBQuery tempMonoDBQuery;
SQLTableQuery tempTableQuery;
if
(distributedQuery.getMonoDBQuery(dbName)
EXCEPTION!
== null)
//throw new DBNotFoundException!!!
System.out.println("ERROR! DBNotFoundException
//if doesn't exist, throw
in
SQLQueryParser.insertWhereIntoTableQuery(");
else
tempMonoDBQuery = (SQLMonoDBQuery)distributedQuery.getMonoDBQuery(dbName);
if
(tempMonoDBQuery.getTableQuery(tableName)
== null)
//throw new TableQueryNotFoundException!!
System. out.println ("ERROR! TableQueryNotFoundException in
SQLQueryParser. insertWhereIntoTableQuery()")
else
tempTableQuery = (SQLTableQuery)tempMonoDBQuery.getTableQuery(tableName);
//make sure that the TableQuery WHERE clause does not begin with AND or OR
Enumeration whereEnumeration = tempTableQuery. getWhereEnumeration();
if
(whereEnumeration.hasMoreElements()
//if there are already WHERE's
59
tempTableQuery. addWhere (whereString)
else
//strip AND or OR off whereString if it's there, then add the whereString
StringBuffer sb = new StringBuffer);
StringTokenizer stringTokenizer = new StringTokenizer(whereString, "
String currentWhere;
while (stringTokenizer.hasMoreTokens()
currentWhere = stringTokenizer.nextToken();
if (currentWhere.equalsIgnoreCase("AND") I
currentWhere.equalsIgnoreCase("OR"))
//don't add currentWhere to stringBuffer
else
sb.append(currentWhere);
if (stringTokenizer.hasMoreTokens()
sb.append("");
tempTableQuery.addWhere(sb.toString());
private void insertSelectIntoTableQuery(String
//throws Exceptions!
columnName)
SQLMonoDBQuery tempMonoDBQuery;
SQLTableQuery tempTableQuery;
if (distributedQuery.getMonoDBQuery(dbName)
EXCEPTION!
//add the stripped whereString
dbName,
== null)
String tableName,
String
//if doesn't exist, throw
//throw new DBNotFoundException!!!
System. out.println ("ERROR! DBNotFoundException in
SQLQueryParser. insertSelectIntoTableQuery()");
else
tempMonoDBQuery = (SQLMonoDBQuery) distributedQuery. getMonoDBQuery (dbName);
== null)
if (tempMonoDBQuery.getTableQuery(tableName)
//throw new TableQueryNotFoundException!!
System. out.println ("ERROR! TableQueryNotFoundException
SQLQueryParser. insertSelectIntoTableQuery)")
in
else
tempTableQuery = (SQLTableQuery)tempMonoDBQuery.getTableQuery(tableName)
+ columnName);
tempTableQuery.addSelect(tableName + "."
private void insertFromIntoTableQuery(String dbName, String tableName, String
//throws Exceptions!
destinationTableQuery)
SQLMonoDBQuery tempMonoDBQuery;
SQLTableQuery tempTableQuery;
if (distributedQuery.getMonoDBQuery(dbName)
EXCEPTION!
== null)
//throw new DBNotFoundException!!!
System. out.println ("ERROR! DBNotFoundException in
SQLQueryParser.insertFromIntoTableQuery()");
60
//if doesn't exist, throw
else
tempMonoDBQuery = (SQLMonoDBQuery)distributedQuery.getMonoDBQuery(dbName);
if (tempMonoDBQuery.getTableQuery(destinationTableQuery) == null)
//throw new TableQueryNotFoundException!!
System. out. println ("ERROR! TableQueryNotFoundExcept ion in
SQLQueryParser. insertFromIntoTableQuery()")
else
tempTableQuery =
(SQLTableQuery)tempMonoDBQuery.getTableQuery(destinationTableQuery);
tempTableQuery. addFrom (tableName);
private String getDBPath(String pathString)
int
if
//throws BadDBPathException?
path index = pathString.indexOf ("->");
(path-index == -1)
return "";
//throw BadDBPathException
else
String DBPathString = pathString.substring(0,
return DBPathString;
private String cropDBPath(String pathString)
int
if
path-index);
//throws BadDBPathException?
path index = pathString.indexaf ("->");
(pathindex == -1)
return "";
//throw BadDBPathException
else
String croppedDBString = pathString.substring(path
pathString.length();
return croppedDBString;
private String getTablePath(String pathString)
int path index
if
=
(pathindex ==
return "";
index + 2,
//throws BadTableException?
pathString.index0f(".");
-1)
//throw BadTableException
else
String DBPathString = pathString.substring(0,
return DBPathString;
private String cropTablePath(String pathString)
int path index
if
=
(pathindex ==
return "";
pathString.indexOf(" .");
-1)
//throw BadTableException
else
61
path-index);
//throws BadTableException?
String croppedDBString = pathString.substring(path
pathString.length());
return croppedDBString;
62
index + 1,
DBDelegator.java
import java.util.*;
import java.sql.*;
public class DBDelegator
private
private
private
private
ClassMapRepository classMapRepository;
InformixJDBCHandler localDBHandler;
Hashtable remoteDBHandlerHashtable;
DistributedQuery distributedQuery;
private Vector localTableNamesVector;
public DBDelegator(DistributedQuery distributedQuery)
remoteDBHandlerHashtable = new Hashtable(;
localTableNamesVector = new Vector();
this.distributedQuery = distributedQuery;
if (distributedQuery != null)
this.classMapRepository
= distributedQuery.getClassMapRepository();
public DBDelegator()
remoteDBHandlerHashtable = new Hashtable();
localTableNamesVector = new Vector();
public void
setClassMapRepository (ClassMapRepository classMapRepository)
this.classMapRepository = classMapRepository;
public
ClassMapRepository getClassMapRepository()
return classMapRepository;
public void setDistributedQuery(DistributedQuery
distributedQuery)
this.distributedQuery = distributedQuery;
public String getFinalResultString()
setupLocalDBHandler();
setupRemoteDBHandlers();
processTableQueries();
ResultSet finalResultSet = getAggregateQueryResultSet();
return resultSetToString (f inalResultSet)
public void setupLocalDBHandlero)
//values need to be changed if a change is made to the configuration of the localDB
localDBHandler = new InformixJDBCHandler(true);
localDBHandler.setUrl("jdbc:informix-
sqli://18.66.0.25:1013/bfuthesis:INFORMIXSERVER=ICMIT");
localDBHandler.setUser("informix");
localDBHandler. setPassword("AndrewMc");
public void setupRemoteDBHandlers()
if
(classMapRepository != null)
Enumeration dbEnumeration = classMapRepository.getDBEnumeration()
while (dbEnumeration.hasMoreElements C))
63
String currentDB = (String)dbEnumeration.nextElement()
ClassMap classMap = (ClassMap) classMapRepository.getClassMap(currentDB);
String connectionUser = classMap.getUser();
String connectionPassword = classMap.getPassword()
String connectionIP = classMap.getIP();
String connectionPort = classMap.getPort();
String connectionDBName = classMap.getDatabaseName);
String connectionDBAlias = classMap.getDatabaseAlias();
String connectionINFORMIXSERVER = classMap.getAdditionalParameters();
//strip 'INFORMIXSERVER=' part of the string
connectionINFORMIXSERVER =
connectionINFORMIXSERVER.substring(connectionINFORMIXSERVER.indexOf ("=") + 1);
InformixJDBCHandler currentHandler = new InformixJDBCHandler(false);
write access
currentHandler. setUser (connectionUser)
currentHandler. setPassword(connectionPassword)
currentHandler.setIP(connectionIP);
currentHandler. setPort (connectionPort)
currentHandler. setDB (connectionDBName);
currentHandler. setINFORMIXSERVER (connectionINFORMIXSERVER);
currentHandler.updateUrl();
remoteDBHandlerHashtable.put (connectionDBAlias, currentHandler)
//no
else
System. out.println("ERROR!
not set/initialized!");
DBDelegator. setupRemoveDBHandlers()
ClassMapRepository
public void processTableQueries()
if
((localDBHandler
(distributedQuery
!=
!=
null)
&& (remoteDBHandlerHashtable.size()
!=
0)
&&
null))
Enumeration monoDBKeysEnumeration = distributedQuery.getMonoDBKeys();
while (monoDBKeysEnumeration.hasMoreElements))
String currentMonoDBString = (String)monoDBKeysEnumeration.nextElement();
SQLMonoDBQuery currentMonoDBQuery =
(SQLMonoDBQuery) distributedQuery. getMonoDBQuery (currentMonoDBString);
if (currentMonoDBQuery == null)
System.out.println("ERROR!
in distributedQuery object!");
DBDelegator.processTableQueries():
No MonoDBQuery's
else
//grab a handle of monoDB Handler HERE
String dbAlias = currentMonoDBQuery.getDatabaseName();
InformixJDBCHandler currentDBHandler =
(InformixJDBCHandler) remoteDBHandlerHashtable.get (dbAlias)
if (currentDBHandler == null)
//bad dbAlias or no handler registered with
that key
System.out.println("ERROR!
not have a InformixJDBCHandler!");
DBDelegator.processTableQueries):
dbAlias does
else
Enumeration tableQueryKeysEnumeration = currentMonoDBQuery.getTableKeys();
while (tableQueryKeysEnumeration.hasMoreElements()
String currentTableQueryString =
(String) tableQueryKeysEnumeration.nextElement();
SQLTableQuery currentTableQuery =
(SQLTableQuery) currentMonoDBQuery. getTableQuery (currentTableQueryString)
64
if
(currentTableQuery == null)
System.out.println("ERROR! DBDelegator.processTableQueries (): No
TableQuery's in currentMonoDBQuery object!");
else
String currentSQL = currentTableQuery.toSQLString(;
try
ResultSet currentResultSet = currentDBHandler.getResultSet(currentSQL);
String localTableName = dbAlias + "
currentTableQuery.getTableName();
//create table
"
+
[DBAlias] _[TableName]
//localDBHandler.insertResultSet(localTableName,
currentResultSet);
localDBHandler.insertTest(localTableName, currentResultSet);
localTableNamesVector.addElement(localTableName);
//keep track of the
Table Names added to the localDB
catch
(SQLException e)
System.out.println("ERROR! Problems processing
DBDelegator.processTableQueries()
SQL ERROR:"+e.getMessage H)
else
System. out.println
error!");
("ERROR!
DBDelegator.processTableQueries():
uninitialized
object
public void dropLocalTables)
try
for
(int
i=O;
i <
localTableNamesVector.size(;
i++)
String currentTable = (String) localTableNamesVector.elementAt (i)
localDBHandler.dropTable (currentTable)
localTableNamesVector = new Vector));
catch
(SQLException e)
System.out.println("ERROR!
ERROR:"+e.getMessage));
Problems processing DBDelegator.dropLocalTables():
SQL
public String convertToAggregateQuery(String dbPathQuery)
StringTokenizer stringTokenizer = new StringTokenizer(dbPathQuery,
"->");
StringBuffer sb = new StringBuffer);
String currentToken;
while (stringTokenizer.hasMoreTokens()
currentToken = stringTokenizer.nextToken()
sb.append(currentToken);
if (stringTokenizer.hasMoreTokens()
sb.append("_");
//assumes
with the name ' [DBPath]_[Table]'
each
[DBPath]->[Table]
return sb.toString();
65
will be stored in the localDB
public ResultSet getAggregateQueryResultSet()
try
String aggregateQuery =
convertToAggregateQuery (distributedQuery.getDBPathQueryString())
distributedQuery. setAggregateQuerystring (aggregateQuery)
return localDBHandler.getResultSet (aggregateQuery)
catch
(SQLException e)
System.out. println ("ERROR! Problems processing
DBDelegator.getAggregateQueryResultSet():
SQL ERROR: "+e.getMessage())
return null;
public String resultSetToString (ResultSet resultSet)
StringBuffer sb = new StringBuffer();
try
ResultSetMetaData metaData = resultSet.getMetaData(;
for (int
i=1; i <= metaData.getColumnCount();
i++)
sb.append(metaData.getColumnName(i));
if (i == metaData.getColumnCounto)
sb.append("\n");
//last line
else
sb.append("\t");
while
for
//tabbed out
(resultSet.next()
(int
i=l;
i
<= metaData.getColumnCount(;
i++)
sb.append(resultSet.getString(i));
if (i == metaData.getColumnCounto)
sb.append("\n");
//last line
else
sb.append("\t");
catch
//tabbed out
(SQLException e)
System. out.println("ERROR!
ERROR:"+e.getMessage));
Problems processing DBDelegator.resultSetToString():
return sb.toStringo);
66
SQL
SQLJDBCHandler.java
import java.sql.*;
import com.informix.jdbc.*;
import java.io.*;
abstract class SQLJDBCHandler
private String connectionUrl = null;
private String connectionUser = null;
private String connectionPassword = null;
private
private
private
private
String
String
String
String
connectionIP = null;
connectionPort = null;
connectionDB = null;
connectionINFORMIXSERVER = null;
private boolean writeAccess = false;
public abstract ResultSet getResultSet(String
public void setParams (String url,
String user,
statementString) throws SQLException;
String password)
setUrl (url)
setUser(user);
setPassword(password);
public abstract void updateUrl();
//updates the connectionUrl with the params
public void setUrl(String url)
connectionUrl = url;
public String getUrl()
return connectionUrl;
public void setUser(String user)
connectionUser = user;
public String getUser()
return connectionUser;
public void setPassword(String password)
connectionPassword = password;
public String getPassword()
return connectionPassword;
public void setIP(String ipString)
connectionIP = ipString;
public String getIP()
return connectionIP;
67
public void setPort(String port)
connectionPort = port;
public String getPort()
return connectionPort;
public void
setDB(String DB)
connectionDB = DB;
public String getDB()
return connectionDB;
public void
setINFORMIXSERVER(String
server)
connectionINFORMIXSERVER = server;
public String getINFORMIXSERVER()
return connectionINFORMIXSERVER;
//***WRITE methods***//
public abstract
void insertResultSet(ResultSet
resultSet)
public abstract void releaseResultSetResources(ResultSet
SQLException;
public abstract
SQLException;
public
throws SQLException;
resultSet)
throws
String generateTableSQL(ResultSetMetaData resultSetMetaData)
//returns SQL for recreating tables from a ResultSet
abstract void copyResultSet(ResultSet
throws SQLException;
68
origResultSet,
ResultSet
throws
copyResultSet)
InformixJDBCHandler.java
import java.sql.*;
import com.informix.jdbc.*;
import java.io.*;
public class InformixJDBCHandler implements SQLJDBCHandler
private String connectionUrl = null;
private String connectionUser = null;
private String connectionPassword = null;
private
private
private
private
String
String
String
String
connectionIP = null;
connectionPort = null;
connectionDB = null;
connectionINFORMIXSERVER = null;
private boolean writeAccess = false;
public InformixJDBCHandler(boolean writeBoolean)
this.writeAccess
= writeBoolean;
public ResultSet getResultSet (String statementString)
throws SQLException
(connectionUrl == null)
if
throw new SQLException("ERROR:
else if
null!");
(connectionUser == null)
throw new SQLException("ERROR:
else if
Connection URL is
Connection User is
null!");
(connectionPassword == null)
throw new SQLException("ERROR:
Connection Password is
null!");
String cmd = statementString;
ResultSet resultSet = null;
Connection conn = null;
try
Class. forName ("com. informix.jdbc. IfxDriver")
//Load Informix JDBC driver
catch
(Exception e)
throw new SQLException("ERROR:
e.getMessage)
+ ")");
failed to load Informix JDBC driver."
+ "("
+
try
conn = DriverManager.getConnection(connectionUrl,
connectionUser,
connectionPassword);
//Make the connection to the DB thru the URL authenicating with user/password
catch
(SQLException e)
throw new SQLException("ERROR:
failed to connect!"
+ "("
+ e.getMessage()
try
Statement stmt = conn.createStatement(ResultSet.TYPESCROLLINSENSITIVE,
ResultSet.CONCURUPDATABLE);
resultSet = stmt.executeQuery(cmd);
69
+ ")");
catch
(SQLException e)
execution failed
throw new SQLException("ERROR:
e.getMessage() + I')");
- statement:"
+ cmd + "("
+
return resultSet;
public void
setParams (String url,
String user,
String password)
setUrl(url);
setUser(user);
setPassword(password);
public void updateUrl()
connectionUrl
=
//updates the connectionUrl with the params
"jdbc:informix-sqli://"
connectionDB +
":"
+
+ connectionPort + "/"
+ connectionIP + ":"
+ connectionINFORMIXSERVER;
+ "INFORMIXSERVER="
public void setUrl(String url)
connectionUrl = url;
public String getUrl()
return connectionUrl;
public void setUser(String user)
connectionUser = user;
public String getUser()
return connectionUser;
public void setPassword(String password)
connectionPassword = password;
public String getPassword()
return connectionPassword;
public void setIP(String ipString)
connectionIP = ipString;
public String getIP()
return connectionIP;
public void setPort(String port)
connectionPort = port;
public String getPort()
70
return connectionPort;
public void setDB(String DB)
connectionDB = DB;
public String getDB()
return connectionDB;
public void setINFORMIXSERVER(String server)
connectionINFORMIXSERVER = server;
public String getINFORMIXSERVER()
return connectionINFORMIXSERVER;
//***WRITE methods***//
public void insertResultSet(String tableName, ResultSet resultSet) throws SQLException
(writeAccess ==
if
false)
throw new SQLException("ERROR:
else if
(connectionUrl == null)
throw new SQLException("ERROR:
else if
Connection URL is null!");
(connectionUser == null)
throw new SQLException("ERROR:
else if
DB not initialized for write access!");
Connection User
is null!");
(connectionPassword == null)
throw new SQLException("ERROR:
Connection Password is null!");
Connection conn = null;
try
Class.forName("com.informix.jdbc.IfxDriver");
//Load Informix JDBC driver
catch
(Exception e)
throw new SQLException("ERROR:
e.getMessage() + ")");
failed to load Informix JDBC driver."
+ "C("
+
try
conn = DriverManager.getConnection(connectionUrl, connectionUser,
connectionPassword);
//Make the connection to the DB thru the URL authenicating with user/password
catch (SQLException e)
throw new SQLException("ERROR:
failed to connect!"
try
Statement stmt
= conn.createStatement();
71
+ "(" + e.getMessage() + ")");
String tableStatementString = generateTableSQL(tableName, resultSet);
tablestatement = stmt.executeUpdate(tableStatementString);
int
//release the DB resources for the statement
stmt.close();
catch (SQLException e)
throw new SQLException("ERROR: execution failed - ResultSet Insert:"
e.getMessageo) + ")");
+ "("
+
public void insertTest(String tableName, ResultSet resultSet) throws SQLException
try
insertResultSet(tableName, resultSet);
ResultSet destinationResultSet = getResultSet("SELECT * FROM " + tableName);
//grab a handle on the newly created table
//MUST HAVE
ResultSetMetaData metaData = destinationResultSet.getMetaData();
THIS LINE -- BUG IN INFORMIX JDBC driver!!!
copyResultSet(resultSet, destinationResultSet);
releaseResultSetResources(resultSet);
releaseResultSetResources(destinationResultSet);
catch (SQLException e)
throw new SQLException("ERROR: execution failed - insertTest" + "("
e.getMessage() + I')");
+
public void dropTable(String tableString) throws SQLException
(writeAccess ==
if
false)
throw new SQLException("ERROR: DB not initialized for write access!");
else if
(connectionUrl == null)
throw new SQLException("ERROR: Connection URL is null!");
else if (connectionUser == null)
throw new SQLException("ERROR: Connection User is null!");
else if (connectionPassword == null)
throw new SQLException("ERROR: Connection Password is null!");
Connection conn = null;
try
Class.forName( "com.informix.jdbc.IfxDriver");
//Load Informix JDBC driver
catch (Exception e)
throw new SQLException("ERROR:
e.getMessage() + ")");
failed to load Informix JDBC driver."
+ "(" +
try
conn = DriverManager.getConnection(connectionUrl, connectionUser,
connectionPassword);
//Make the connection to the DB thru the URL authenicating with user/password
catch (SQLException e)
72
throw new SQLException("ERROR:
failed to connect!"
+ e.getMessageo)
+ "("
+ ")")-
try
Statement stmt = conn.createStatement(ResultSet.TYPESCROLLINSENSITIVE,
ResultSet.CONCURUPDATABLE);
stmt.execute("DROP TABLE "+tableString);
stmt.close();
//release the DB resources for the statement
catch
(SQLException e)
- DROP TABLE:" +
throw new SQLException("ERROR: execution failed
" (e.getSQLState()=" + e.getSQLStateo)
+ " e.getErrorCode(="+e.getErrorCode(+")");
public void releaseResultSetResources(ResultSet resultSet) throws SQLException
Statement
stmt = resultSet.getStatement();
resultSet.closeo);
stmt.closeo);
public String generateTableSQL (String tableName, ResultSet resultSet)
SQLException
//returns SQL for recreating tables from a ResultSet
throws
int columntype = 0;
String columnNameString = "";
String columnTypeString = "";
StringBuffer sb = new StringBuffer();
try
ResultSetMetaData resultSetMetaData = resultSet.getMetaData();
//map table name with some
sb.append("CREATE TABLE "+ tableName + " ( ");
significance
int numberofcolumns = resultSetMetaData.getColumnCounto);
for (int i=1; i <= numberofcolumns; i++)
//columntype = resultSetMetaData.getColumnType(i);
columnNameString = resultSetMetaData.getColumnName(i);
columnTypeString = JDBCtoInformixType(resultSetMetaData.getColumnTypeName(i));
sb.append("\n
");
" + columnTypeString);
sb.append(columnNameString + "
//getTypeName(column_type));
(columnTypeString.equalsIgnoreCase ("CHAR")
if
columnTypeString. equalsIgnoreCase ("VARCHAR")
1 columnTypeString.equalsIgnoreCase ("DECIMAL")
columnTypeString. equalsIgnoreCase ("LONGVARCHAR")
|
size if the column is of the above types
//insert
sb.append(" (" + resultSetMetaData.getColumnDisplaySize(i)
if
+ ")")
(i < numberofcolumns)
sb.append(",");
sb.append("\n
catch
);");
(SQLException e)
throw new SQLException("ERROR: execution
column count:" + "(" + e.getMessage()
+ ")");
return sb.toStringo);
73
failed
- ResultSetMetaData cannot
get
public void copyResultSet(ResultSet
SQLException
//Req:
ResultSet copyResultSet)
throws
Tables column types must be the same and be indexed!
ResultSetMetaData
int type_int;
while
origResultSet,
origMetaData
= origResultSet.getMetaData()
(origResultSet.next() == true)
row
// moves cursor to the insert
copyResultSet.moveToInsertRow()
//for each column in the
i++)
i=1; i <= origMetaData.getColumnCount();
for (int
row
type_int
if
= origMetaData.getColumnType(i)
(typeint ==
Types.ARRAY)
//CANNOT COPY WITH JDBC
//throw Exception
else if (typeint == Types.BIGINT)
copyResultSet.updateInt(i,
else if
origResultSet.getInt(i))
(typeint == Types.BINARY)
try
InputStream tempInputStream = origResultSet.getBinaryStream(i);
copyResultSet.updateBinaryStream(i, tempInputStream,
tempInputStream.available() );
//check if tempInputStream.available() is correct
catch
(IOException e)
throw new SQLException("ERROR: Problems processing InputStream in Query:
e.getMessage));
else if
(typeint == Types.BIT)
//CANNOT COPY WITH JDBC
//throw Exception
else if
(type_int == Types.BLOB)
//CANNOT COPY WITH JDBC
//throw Exception
else if
(typeint == Types.CHAR)
copyResultSet.updateInt(i,
else if
origResultSet.getInt(i));
(type_int == Types.CLOB)
//CANNOT COPY WITH JDBC
//throw Exception
else if
(typeint == Types.DATE)
copyResultSet.updateDate (i,
else if
origResultSet. getDate (i)
(typeint == Types.DECIMAL)
copyResultSet.updateBigDecimal (i,
else if
origResultSet. getBigDecimal (i));
(type_int == Types.DISTINCT)
74
+
//CANNOT COPY WITH JDBC
//throw Exception
else if
(type-int == Types.DOUBLE)
copyResultSet.updateDouble (i,
else if
(typeint == Types.FLOAT)
origResultSet.getFloat (i))
copyResultSet.updateFloat(i,
else if
(type_int == Types.INTEGER)
copyResultSet.updateInt(i,
else if
origResultSet .getDouble (i))
(typeint
origResultSet.getInt(i))
== Types.JAVAOBJECT)
copyResultSet.updateObject (i,
else if
origResultSet. getObject (i));
(type_int == Types.LONGVARBINARY)
try
InputStream tempInputStream = origResultSet.getBinaryStream(i)
copyResultSet.updateBinaryStream(i, tempInputStream,
tempInputStream.available() );
catch
(IOException e)
throw new SQLException("ERROR: Problems processing InputStream in Query:
e.getMessage));
else if
(type_int == Types.LONGVARCHAR)
copyResultSet.updateString(i,
else if
origResultSet.getString(i))
(type_int == Types.NULL)
copyResultSet.updateNull(i);
else if
(type_int == Types.NUMERIC)
//CANNOT COPY WITH JDBC
//throw Exception
else if
(type_int == Types.OTHER)
//CANNOT COPY WITH JDBC
//throw Exception
else if
(typeint == Types.REAL)
//CANNOT COPY WITH JDBC
//throw Exception
else if
(typeint == Types.REF)
//CANNOT COPY WITH JDBC
//throw Exception
else if
(typeint == Types.SMALLINT)
copyResultSet.updateInt(i,
else if
origResultSet.getInt(i))
(typeint == Types.STRUCT)
//CANNOT COPY WITH JDBC
//throw Exception
75
+
else if
(type int == Types.TIME)
copyResultSet.updateTime(i,
else if
origResultSet.getTime(i))
(type-int == Types.TIMESTAMP)
copyResultSet.updateTimestamp(i,
else if
(type-int == Types.TINYINT)
copyResultSet.updateInt (i,
else if
origResultSet.getTimestamp (i))
origResultSet. getInt (i))
(type_int == Types.VARBINARY)
try
InputStream tempInputStream = origResultSet.getBinaryStream(i);
tempInputStream,
copyResultSet.updateBinaryStream(i,
tempInputStream.available() );
catch
(IOException e)
throw new SQLException("ERROR:
e.getMessage));
else if
Problems processing
InputStream in Query:
(type_int == Types.VARCHAR)
copyResultSet.updateString(i,
origResultSet.getString(i))
copyResultSet.insertRow();
copyResultSet.moveToCurrentRow();
Connection conn =
(copyResultSet.getStatement()
.getConnection(;
//mapping between JDBC Types and Informix Types at
//www. informix.com/answers/english/docs/220sdk/jdbcl4/program. fm4. html
http:
//
//JDBC API Data Type from java.sql.Types Corresponding Informix Data Type
INT8
//BIGINT
//BINARY
BYTE
//BIT Not supported
CHAR(n)
//CHAR
DATE
//DATE
//DECIMAL DECIMAL
FLOAT
//DOUBLE
SMALLFLOAT
//FLOAT
INTEGER
//INTEGER
BYTE
//LONGVARBINARY
TEXT
//LONGVARCHAR
//NUMERIC DECIMAL
SMALLFLOAT
//REAL
SMALLINT
//SMALLINT
DATETIME
//TIME
DATETIME
//TIMESTAMP
SMALLINT
//TINYINT
//VARBINARY
BYTE
//VARCHAR VARCHAR(m,r)
private String JDBCtoInformixType (String type)
if
(type.equalsIgnoreCase ("BIGINT"))
return "INT8";
else
if
(type.equalsIgnoreCase("BINARY"))
return "BYTE";
76
+
else if
(type. equalsIgnoreCase ("BIT"))
//NOT SUPPORTED!
return
else
if
return
else if
(type.equalsIgnoreCase("CHAR"))
"CHAR";
(type.equalsIgnoreCase("DATE"))
return "DATE";
else if
(type.equalsIgnoreCase ("DECIMAL"))
return "DECIMAL";
else if
return
else if
(type.equalsIgnoreCase("DOUBLE"))
"FLOAT";
(type.equalsIgnoreCase("FLOAT"))
return "SMALLFLOAT";
else if
type.equalsIgnoreCase("INT"))
(type.equalsIgnoreCase("INTEGER")
return "INTEGER";
else if
(type.equalsIgnoreCase ("LONGVARBINARY"))
return "BYTE";
else if
(type. equalsIgnoreCase ("LONGVARCHAR"))
return "TEXT";
else if
(type.equalsIgnoreCase ("NUMERIC"))
return "DECIMAL";
else if
(type.equalsIgnoreCase("REAL"))
return "SMALLFLOAT";
else if
(type.equalsIgnoreCase("SMALLINT"))
return "SMALLINT";
else if
(type.equalsIgnoreCase ("TIME"))
return "DATETIME";
else if
(type.equalsIgnoreCase("TIMESTAMP"))
return "DATETIME";
else if
(type.equalsIgnoreCase("TINYINT"))
return "SMALLINT";
else if
return
else
if
(type. equalsIgnoreCase ("VARBINARY"))
"BYTE";
(type.equalsIgnoreCase("VARCHAR"))
return "VARCHAR";
return
77
private String InformixtoJDBCType(String type)
if
(type.equalsIgnoreCase("INT8"))
return "BIGINT";
else if
return
else if
(type.equalsIgnoreCase("BYTE"))
"BINARY";
(type. equalsIgnoreCase ("CHAR"))
return "CHAR";
else if
(type.equalsIgnoreCase("DATE"))
return "DATE";
else if
(type. equalsIgnoreCase ("DECIMAL"))
return "DECIMAL";
else if
(type.equalsIgnoreCase("FLOAT"))
return "DOUBLE";
else if
(type. equalsIgnoreCase ("SMALLFLOAT"))
return "FLOAT";
else if
type.equalsIgnoreCase("INT"))
(type.equalsIgnoreCase("INTEGER")
return "INTEGER";
else if
(type.equalsIgnoreCase("TEXT"))
return "LONGVARCHAR";
else if
(type.equalsIgnoreCase ("SMALLFLOAT"))
return "REAL";
else
if
(type.equalsIgnoreCase("SMALLINT"))
return "SMALLINT";
else if
(type.equalsIgnoreCase ("DATETIME"))
return "TIME";
else if
return
(type.equalsIgnoreCase ("VARCHAR"))
"VARCHAR";
return
private String getTypeName(int typeint)
if
(type_int == Types.ARRAY)
return "ARRAY";
else if
(typeint ==
Types.BIGINT)
return "BIGINT";
else if
(typeint == Types.BINARY)
78
return
else if
"BINARY";
(type-int == Types.BIT)
return "BIT";
else if
(type_int == Types.BLOB)
return "BLOB";
else if
(type int
==
Types.CHAR)
return "CHAR";
else if
(type-int == Types.CLOB)
return "CLOB";
else if
(type-int ==
Types.DATE)
return "DATE";
else if
return
else if
(typeint
Types.DECIMAL)
"DECIMAL";
(type-int == Types.DISTINCT)
return "DISTINCT";
else if
(type-int == Types.DOUBLE)
return "DOUBLE";
else if
(type_int == Types.FLOAT)
return "FLOAT";
else if
(type-int == Types.INTEGER)
return "INTEGER";
else if
(type-int
==
Types.JAVAOBJECT)
return "JAVAOBJECT";
else if
(type_int == Types.LONGVARBINARY)
return "LONGVARBINARY";
else if
(typeint
== Types.LONGVARCHAR)
return "LONGVARCHAR";
else if
(typeint == Types.NULL)
return "NULL";
else if
(type-int == Types.NUMERIC)
return "NUMERIC";
else if
(typeint ==
Types.OTHER)
return "OTHER";
else if
(typeint == Types.REAL)
return "REAL";
79
else if
(type-int ==
Types.REF)
return "REF";
else if
return
else if
return
else if
(typeint ==
Types.SMALLINT)
"SMALLINT";
(typeint == Types.STRUCT)
"STRUCT";
(typeint == Types.TIME)
return "TIME";
else
if
(type-int
==
Types.TIMESTAMP)
return "TIMESTAMP";
else if
(type_int == Types.TINYINT)
return "TINYINT";
else
if
(type-int
== Types.VARBINARY)
return "VARBINARY";
else if
(type-int ==
Types.VARCHAR)
return "VARCHAR";
else
return ""
80