Scalable and Extensible Infrastructures for Distributing

Scalable and Extensible Infrastructures for Distributing
Interoperable Geographic Information Services on the
Internet
by
NADINE S. ALAMEH
B.E., Computer and Communication Engineering, American University of Beirut (1994)
M.S., Civil and Environmental Engineering, Massachusetts Institute of Technology (1997)
M.C.P., Urban Studies and Planning, Massachusetts Institute of Technology (1997)
Submitted to the Department of Civil and Environmental Engineering
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Computer and Information Systems
Engineering
ENG
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
MA
SSACHUSETTS INSTITUTE
OF TECHNOLOGY
February 2001
FEB 2 2 2001
@ Nadine Alameh, 2001. All Rights Reserved.
The author hereby grants to MIT permission to reproduce and distribute publicly paper ana efe=
tronic copies of this document in whole or in part, and to grants others the right to do so.
Author ...................
.
.......,
..
t
,...
. ..
...........................
Department of Civiand Environmental Engineering
January 19, 2001
Certified by ...........................................
7
.,-a Ferreira
Research
Operations
Professor of Urban Planning and
Thesis Supervisor
ii
Certified b y ......................................
....
-.- I
.,....................................................
S
Professor
Accepted by
John Williams
Civil and Environmental Engineering
Thesis Reader
I.".".
..................
Oral Buyukozturk
Chairman, Departmental Committee on Graduate Students
LIBRARIES
Scalable and Extensible Infrastructures for Distributing Interoperable
Geographic Information Services on the Internet
by
Nadine S. Alameh
Submitted to the Department of Civil and Environmental Engineering on
January 19, 2001, in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Computer and Information Systems Engineering
Abstract
The explosive growth in Internet-powered services has fueled the quest for finding new killer
Internet-based applications. This quest has often led to applications based on Geographic Information Systems (GIS), especially in the emerging field of the Mobile Internet. Unfortunately, the traditional GIS model falls short of accommodating the requirements and needs of the Internet
environment.
A more flexible GIS model is required to support the growing need for sharing increasingly available yet distributed geographic data, and for facilitating the integration of GIS with other information systems. Such a model will be especially beneficial for scientific research and engineering
modeling as well as state and federal government settings, where tightly coupled hierarchical systems are unlikely to have the desired breadth and flexibility. This next generation flexible GIS
model is seen to deliver GIS functionalities as independently-provided, yet interoperable, services
over the Internet. Such services can then be dynamically chained to construct customized applications.
The goal of this thesis is to develop a framework for building a scalable and extensible infrastructure that can support and facilitate the dynamic chaining of distributed services. Towards that goal,
the thesis evaluates and contrasts a set of alternative architectures. In doing so, it identifies the key
elements and players, and focuses on issues pertaining to error handling, back-tracing of data and
services in transactions, as well as service discovery and network management.
A detailed analysis of a typical use case shows that a federated architecture is the most promising
in terms of meeting the scalability, extensibility and flexibility requirements of the infrastructure.
In this context, the thesis stresses the necessity of service and catalog interoperability, the need for
GIS metadata standards which comply with general IT standards, and the usefulness of XML in
defining extensible GIS data exchange standards. The thesis argues that the sustainability of a distributed infrastructure also depends on successful organizational partnerships, scalable schemes
for network management, as well as technical enhancements of GIS services in terms of data
streaming techniques and effective compression standards for GIS data on the Internet.
Thesis Supervisor: Joseph Ferreira
Title: Professor of Urban Planning and Operations Research
Acknowledgments
Thank God it's finally over! The Ph.D. process has been such an extremely demanding experience,
sometimes more on the emotional level than the intellectual one. And truth be told, I could never
have gone through it without the help and support of several people around me.
I would like to thank Professor J. Ferreira, my research advisor for six years, for his support and
guidance throughout this study and my stay at MIT. He has been a great inspiration. I have gained
a lot from his expertise, his enthusiasm and his interest in how technologies can shape our lives.
My sincere thanks also go to Professor K. Amaratunga for his patience and his time, and to Professor J. Williams for his help throughout my graduate studies. My most sincere thanks also go to
Cynthia Stewart for her patience and support, especially during the last few months.
I consider myself very lucky to have had John Evans as my office-mate for the last three years. I
thank him for his help, sense of humor and most of all his friendship. Thanks for having the
patience to listen to me nagging about this thesis on a daily basis! Many special thanks to the CRL
staff for making me feel right at home, especially Tom Grayson who always knew how to cheer me
up! And of course, the PSS alums for their support and friendship, especially Raj Singh, Matt Gentile, and Ayman Ismail.
Throughout my years at MIT, I have made very special friends who have made my stay here very
enjoyable! I thank them dearly, especially Fadi Karameh, Saad Mneimneh and Mazen Wehbeh
from the lebanese gang, and Petros Komodoros, Salal Humair and Terry Vendlins from the 1.00
gang! I would also like to thank my Jazzercise buddies and instructors for making me look forward
to that one energy-boosting hour every day!
I can never thank enough my husband and best friend, Hisham Kassab, for his love and patience,
his endless attempts at motivating me and for accommodating my weird mood swings! At some
level, this experience has tremendously strengthened our relationship and I am thankful for that. I
feel very lucky to have someone as nice, so caring and so smart by my side, now and in the future.
Finally, I thank my family for the unconditional support they provided me! I especially thank my
mother for her endless prayers and her daily support whether on the phone or through emails. My
thanks to my sister, my "baby" and great friend Rola (aka Roro), for enriching my life in ways I
could never describe. And for my brother Rani for his support and encouragement and for teaching
me so much about patience, perseverance and hope. I dedicate this thesis to him.
Table of Contents
1
Introduction ............................................................................................................................
1.1
Overview .......................................................................................................................
1.2 M otivation ...................................................................................................................
* 1.2.1 Benefits of A D istributed GIS Infrastructure.............................................
* 1.2.2 The Significance of Scalability and Interoperability ..................................
* 1.2.3 The Sources of Com plexity ......................................................................
1.3 Objectives and Contributions ..................................................................................
1.4 Research M ethodology...........................................................................................
* 1.4.1 Looking at Existing Technologies and Efforts ..........................................
* 1.4.2 Learning by D oing: A Prototyping Experim ent.........................................
* 1.4.3 Identifying Basic Architectural Elements and Setups ...............................
1.5 Thesis Organization................................................................................................
9
9
10
11
13
15
18
19
20
20
20
21
2 Background...........................................................................................................................22
2.1
2.2
2.3
2.4
2.5
The Evolution of GIS .............................................................................................
- 2.1.1 Legacy GIS System s ..................................................................................
* 2.1.2 Influences of Emerging Technologies ......................................................
* 2.1.3 A vailability of Spatial D ata ......................................................................
- 2.1.4 Impact of the Internet................................................................................
* 2.1.5 Emerging Role of GIS in Today's Enterprises ...........................................
- 2.1.6 M obile and W ireless Technologies...........................................................
Interoperability and Standards ...............................................................................
* 2.2.1 Interoperability...........................................................................................
* 2.2.2 Standards....................................................................................................
Ongoing Standardization Efforts...........................................................................
* 2.3.1 Early standardization efforts ......................................................................
* 2.3.2 Spatial Data Transfer Standard (SD TS)....................................................
* 2.3.3 N ational Spatial D ata Infrastructure (N SD I) .............................................
* 2.3.4 Open GIS Consortium (OGC)......................................................................
* 2.3.5 W orld W ide W eb Consortium (W 3C) ..........................................................
* 2.3.6 Other Efforts .............................................................................................
The Internet .................................................................................................................
* 2.4.1 Identifying Key Success Factors................................................................
* 2.4.2 Im plications................................................................................................
Application Service Providers (A SP)....................................................................
* 2.5.1 Overview ....................................................................................................
* 2.5.2 Issues.............................................................................................................
* 2.5.3 Im plications................................................................................................
3 Case Study: Raster Image Re-projection Service Prototype ......................................
Introduction .................................................................................................................
3.1
3.2 Re-projection: Overview and M otivation ...............................................................
* 3.2.1 O verview ....................................................................................................
* 3.2.2 M otivation: Reasons for Picking Re-projection ........................................
3.3 Re-projection Service: Design Process and Preliminary Interface ........................
* 3.3.1 Interface Design Process...........................................................................
* 3.3.2 Im age Re-projection Interface ..................................................................
4
23
23
23
25
26
26
27
27
28
29
33
33
33
34
35
37
38
39
39
42
43
44
45
47
49
49
50
50
51
52
53
53
3.4
3.5
- 3.3.3 GetCapabilities Request...........................................................................
56
58
Re-projection Service: a Prototype Implementation ...............................................
* 3.4.1 Implementation Options............................................................................
59
. . 60
* 3.4 .2 P rototype ..................................................................................................
* 3.4.3 Chaining Prototype with MITOrtho Server ...............................................
63
66
Synthesis of Observations and Findings .................................................................
66
* 3.5.1 Standards and Interoperability Issues ........................................................
68
* 3.5.2 Inherent GIS Design Issues.......................................................................
69
* 3.5.3 Distributed Infrastructure and Chaining Issues.........................................
71
* 3.5.4 Distinctive Characteristics of the GIS Problem.........................................
4 Architectures: Components, Chaining and Issues .........................................................
4 .1
O verv iew .....................................................................................................................
4 .2 A ppro ach .....................................................................................................................
* 4.2.1 Example Scenario and Assumptions.........................................................
* 4.2.2 Focus of Analysis.......................................................................................
Abstraction Level 1: Decentralized Architectures .................................................
4.3
* 4.3.1 Geo-Processing Services as Basic Components ........................................
* 4.3.2 User-Coordinated Service Chaining ........................................................
* 4.3.3 Complexities of Nested Calls.....................................................................
* 4.3.4 Issues and Implications .............................................................................
* 4.3.5 Aggregate Services ....................................................................................
4.4 Abstraction Level 2: Federated Architectures with Catalogs..................................
* 4.4.1 Catalogs for Service Discovery .................................................................
* 4.4.2 Service Discovery and Chaining................................................................
* 4.4.3 Issues and Implications .............................................................................
4.5 Abstraction Level 3: Federated Architectures with Mediating Services.................
* 4.5.1 Mediating (Smart) Services ......................................................................
* 4.5.2 Mediating Services and Service Chaining ...............................................
* 4.5.3 Issues and Implications .............................................................................
4 .6 S um mary .....................................................................................................................
75
75
75
76
77
78
78
79
81
83
84
85
86
86
87
89
89
90
90
92
94
5 Synthesis and Conclusions..............................................................................................
94
5.1
Summary of Research .............................................................................................
96
5.2 Navigation Framework (Roadmap).........................................................................
96
* 5.2.1 Description of Framework .........................................................................
102
* 5.2.2 Applications of the Framework ..................................................................
5.3 Implications on Required Standards and Protocols for Future Research..................103
104
* 5.3.1 Data Format and Exchange Standards ........................................................
106
* 5.3.2 Service Chaining and Data Passing ............................................................
108
* 5.3.3 Message Passing and Dialog Structure .......................................................
110
* 5.3.4 O ther Implications ......................................................................................
111
..........................................................
and
Markets
5.4 The Future of GIS Technologies
111
* 5.4.1 Dynamics of the Future GIS Marketplace ..................................................
113
* 5.4.2 Challenges and Opportunities .....................................................................
Appendix A Traditional GIS Model ....................................................................................
115
Appendix B Map and Capabilities Request Specifications................................................116
B .1
M ap Interface ............................................................................................................
5
116
B.2
Capabilities Interface.................................................................................................117
A ppendix C Projection and Re-projection..........................................................................121
C.1
C.2
C.3
C.4
C.5
Projection Surfaces....................................................................................................121
Types of Projections..................................................................................................122
Interpolation/Resam pling M ethods ...........................................................................
Prototype Capabilities Sum mary...............................................................................122
Re-projection Approxim ation ...................................................................................
Bibliography..........................................................................................................................132
6
122
123
List of Figures
Figure 1.1 A simplified view of the service-centered GIS infrastructure................................ 12
16
Figure 1.2 Setup option 1: Client coordinates among needed services....................................
Figure 1.3 Setup option 2: Services chain, transparently to the user......................................
16
Figure 1.4 Service chaining and metadata tracking. ..............................................................
17
Figure 1.5 The three perspectives of the research methodology.............................................
19
Figure 2.1 Evolution of GIS in response to enabling technological advances........................ 24
Figure 2.2 Key drivers for unbundling and distributing GIS .................................................
25
Figure 2.3 Simplicity of Internet standards and consensus for consistency. .........................
42
44
Figure 2.4 Convergence of business and technology conditions. ..........................................
Figure 3.1 Raster im agery re-projection. ................................................................................
51
Figure 3.2 Original jpg image in the Mass State Plane reference system...............................
61
Figure 3.3 Image re-projected into lat/long using ArcInfo. ...................................................
61
Figure 3.4 Image re-projected to lat/lon using approximation method..................................
63
64
Figure 3.5 Prototype re-projection service: Internal flowchart................................................
Figure 3.6 Chaining re-projection service with MITOrtho server: Interaction flowchart. ......... 65
Figure 3.7 Encoding geo-referenced imagery using XML. ...................................................
67
Figure 4.1 Illustration of services used in the example...........................................................77
Figure 4.2 User-coordinated service chaining in decentralized architectures......................... 80
Figure 4.3 Using nested calls for service chaining. ................................................................
81
Figure 4.4 A ggregate services................................................................................................
84
Figure 4.5 The role of catalogs in a federated setup. ..............................................................
87
Figure 4.6 Service chaining with mediating (smart) services..................................................
90
Figure 5.1 The shift to distributed interoperable GIS components.........................................
95
Figure 5.2 Sample applications positioned with respect to client and data dimensions. .......... 102
Figure 5.3 Example 1: Using XML to minimize data transfer in federated architectures........ 107
Figure 5.4 Example 2: Using XML to minimize data transfer in federated architectures........ 107
Figure 5.5 Reducing redundancy in capabilities retrieval.........................................................
109
Figure 5.6 Potential value chain for the future GIS marketplace..............................................
112
Figure C .1 Sample Projection Surfaces [89].............................................................................
121
Figure C.2 A rectangle in Mass State Plane Coordinates. ........................................................
124
Figure C.3 The reprojected rectangle in Lat/Lon......................................................................
124
7
List of Tables
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
T able
1.1: Example criteria for querying services in catalogs................................................
2.1: The background at a glance..................................................................................
2.2 : The three abstraction levels of interoperability issues. .......................................
2.3 : De Facto versus De Jure standards......................................................................
2.4 : Tim ing of standards..............................................................................................
2.5 : Levels of standardization.......................................................................................
2.6 : Organizations involved in Internet coordination and standardization...................
2.7 : Advantages of the ASP model..............................................................................
3.1 : Preliminary raster re-projection service query parameters....................................
3.2 : Re-projection service capabilities parameters......................................................
3.3 : Sample re-projection times using ArcInfo. ..........................................................
3.4 : Distinctive characteristics of the GIS problem.....................................................
4.1 : Simplified service requests with input/output parameters.....................................
4.2 : Summary of architectures: components, service chaining and issues...................
5.1 : Framework Dimension 1: Application environment...........................................
5.2 : Framework Dimension 2: Data characteristics.....................................................
5.3 : Framework Dimension 3: Service characteristics. ...............................................
5.4 : Framework Dimension 4: Client characteristics. .................................................
5.5 : Framework Dimension 5: Standards. ...................................................................
B .1 : The M ap request. .................................................................................................
8
17
22
29
30
31
31
41
44
57
58
60
72
80
93
98
99
100
101
101
116
Chapter 1
Introduction
1.1 Overview
Geographic Information Systems (GIS) have been in existence since the late 1970's, primarily as
stand-alone applications used to "capture, model, manipulate, retrieve, analyze, and present geographically referenced data" [108]. Traditionally, these systems have been independently developed by organizations outside their Information Systems (IS) departments in response to specific
internal user needs. Unfortunately, this independent development resulted in GIS becoming fragmented and isolated from mainstream Information Systems, in addition to creating numerous
incompatible data formats, structures and spatial conceptions.
Today, the incompatibility problems are accentuated by the growing need for sharing increasingly available GIS data, and the proliferation of the Internet as an infrastructure for sharing and
distributing this data. Furthermore, in today's more networked world, new requisites are emerging
due to recent pushes for integrating GIS with other Information Systems, and for supporting
mobile computing devices requirements. It is now evident that the traditional model for designing,
delivering, and using GIS needs to be adjusted to accommodate the requirements of a more
dynamic networked environment.
This thesis is based on the premise that a new scalable and extensible infrastructure is needed
to support a more flexible model for delivering and using GIS. The next generation GIS model is
likely to deliver GIS functionalities over the Internet, as independently-developed, yet interoperable autonomous services. Such services are essentially processes that run on web servers, offering
fundamental GIS functions via standardized access interfaces. Under this model, potential users
9
have the option of combining and chaining services in order to construct customized solutions for
their problems, and to integrate them with their current systems. However, in an environment
where both services and data are constantly added, removed and updated, dynamic chaining of services will be a complex process. In order to understand how to manage this complexity, the thesis
introduces a set of coordination elements (such as catalogs and mediating agents), whose purpose
is to facilitate, simplify and possibly optimize the dynamic chaining of services.
One of the main goals of this work is to evaluate some available architectural choices and
associated issues and trade-offs for supporting a flexible infrastructure that is both scalable and
extensible in light of the complexities of service chaining. A significant challenge lies in identifying and prioritizing the issues that need to be addressed, both in the short run and the long run. In
addressing this challenge, current and anticipated GIS requirements (such as high bandwidth,
semantics and performance requirements) must be accommodated, with the understanding that the
only constants are rapid change and innovation.
The Internet and its technologies are at the heart of this research given the crucial role they
play in the push for unbundling and distributing geographic services. At the center of today's
dynamic environment, the Internet is the current defacto information infrastructure and standard
for communication and networking [20], and is one of the key drivers for the interest in a distributed geo-processing infrastructure. Accordingly, in this thesis, the Internet is assumed to be the
platform upon which the geo-processing infrastructure is built. With its tremendous growth over
the last decade, its global reach, simplicity and ease of use, as well as its tested and evolving technologies, the Internet indeed provides a fertile environment for delivering and accessing the specialized geoprocessing services.
It is important to clarify at this point, that by using the Internet to distribute and access the services, such services will not necessarily be public to all. In fact, Internet technologies are increasingly being used as default platforms in today's organizations for distributing applications and
information, both within the enterprise (Intranets) and among their partners (Extranets). Therefore,
organizations can provide Internet-based geo-processing services internally to their employees,
and externally to their partners, without necessarily making them available on the global Internet
1.2 Motivation
The motivation for this thesis stems from the interest in understanding how recent developments in
the Information Technology field can enhance the design, delivery and use of GIS technology. In
10
the last decade, the IT field has witnessed the exponential growth of the Internet, a growing popularity of handheld and mobile device computing, countless implementations for Enterprise
Resource Planning systems, and the introduction of new concepts such as Intranets, Extranets, and
Application Service Providers. These emerging technologies are having a direct impact on today's
business environment and are changing users' expectations and requirements for the technology.
In light of the current IT developments, the traditional model for delivering and using GIS falls
short of accommodating the new users' requirements. Unbundling GIS and distributing their functionalities as interoperable services on a network (such as the Internet) promise to lead to a better
service model for current and potential GIS users. In the next sections, we show how the distributed model can indeed meet the new requirements, and how it is currently materializing with a
push from various organizations and experts in the GIS field.
1.2.1 Benefits of A Distributed GIS Infrastructure
Anyone who has used GIS in the last two decades is probably familiar with what is referred to
in this thesis as the traditional model of using GIS (See Appendix A). This model requires the
users to first get trained on a specific GIS package (such as ArcInfo) before using its spatial analysis and visualization tools in their projects. Moreover, the analysis stage of a project is typically
preceded by a lengthy data preparation phase. This phase often includes finding and buying CDROMS for data and imagery for areas covering or overlapping with the project's area of interest,
converting the data into formats usable by a GIS package of choice, and extracting, from the data
collected, a subset that is directly relevant to the particular project. Only after this lengthy data
assembly and extraction process is the user ready for performing the analysis for the project.
As a result, it is estimated that 60% to 85% of the cost of a typical GIS implementation project
is due to the data conversion and integration step [48]. Furthermore, for most purposes, users only
use a fraction of the functionalities offered by their GIS package [50]. Therefore, in most cases, the
costly packages are under-utilized. Nowadays, the growing amount of data available in a variety of
incompatible formats, and the broadening base of users (many interested in GIS applications in
new fields) imply that a more flexible model is needed for delivering on-demand easy-to-use subsets of GIS data and functionalities.
The new model (Figure 1.1) is based on the concept of unbundling functionalities in current
stand-alone GIS systems into interoperable autonomous components/services. The unbundled
functionalities can be used individually (or in sets) to perform specific tasks with little training.
Such a model also facilitates the integration of these services with other Information Systems in
11
organizations, a trend that is on the rise given the recent wave of Enterprise Resource Planning
implementations and the increasing recognition of the utility of spatial data in today's businesses
[70].
Mobile Devices:
Retrieve image
centered at
current location
Browsers:
Retrieve several
overlayed images
as one
Information Systems:
Perform business analysis based
on customers' geographical
distribution
Standardized Interfaces
Bundled
Service
Overlay
Service
Imagery
Service
Vector Data
Service
Imagery
Service
Address Match
Service
Figure 1.1 A simplified view of the service-centered GIS infrastructure.
Today's demand for individual GIS functionality is also fueled by a growing market of handheld and mobile computing devices, that require a subset of functionalities at a time, and are limited in their processing power, memory and bandwidth. Nowadays, the availability of inexpensive
hardware, the standardized Internet communication protocols, and the ease of connecting GPS
devices to handheld devices, are making these devices more accessible to several parties. Indeed,
many companies in the transportation, telecommunications, and agriculture fields view the use of
handheld devices and location-based services as necessary for gaining a competitive edge and
increasing productivity [105]. A model that allows handheld devices to have access to GIS services via standardized interfaces will enable these devices to broaden the range of applications
they can support in the field. This could be achieved by having the device connect to a specialized
server via a simple protocol, which, depending on the query, conveniently returns only the data
that is needed at a particular time for a particular location.
In summary, an infrastructure that supports a network of distributed interoperable and autonomous GIS services promises to address the emerging requirements of today's users. It saves the
growing number of GIS users from the burden of buying, setting up, learning and maintaining a
12
full GIS package. It also enables organizations to better share and make use of available data as
they can access and chain the available services to transform or process any data to fit their interests [76]. Finally such a service-oriented infrastructure allows clients (including handheld and
mobile devices) to focus on visualization and interaction with data while leaving the computationally intensive tasks and data management to the services. The proposed infrastructure hence provides users with the flexibility they currently seek as a result of today's extensive business changes
and rapid technology advances. It also offers them a much needed breadth of independently-provided services that they can freely mix and match to create better customized applications, with
minimal in-house development.
1.2.2 The Significance of Scalability and Interoperability
Given the description of the service-centered infrastructure presented thus far, this thesis
focuses on efficient and scalable mechanisms for facilitating the chaining and interactions of the
dynamic services. We argue that there are two reasons why this focus is timely and particularly relevant for the future success of this infrastructure.
First, given the recent advances in networking, distributed computing and databases, it is now
technologically feasible to implement the various autonomous services and make them available to
users via standardized interfaces over the Internet. Indeed, the construction of the infrastructure is
made possible by the evolution of infrastructures and middleware technologies that support distributed and client/server computing (such as RMI, CORBA, DCOM, ODBC, JDBC). In addition,
the infrastructure can employ the best of today's Internet technologies, including TCP/IP, HTML,
XML, Java as well as standard web practices for security/authentication. The availability of these
technologies today helps shift the focus from the task of building and providing the services from
scratch, to addressing higher-level infrastructure issues, such as scalability, interoperability and
extensibility in the distributed environment. Indeed, the infrastructure will be usable and sustainable in the long run, only if the distributed setup can scale with the number of users and services.
The second reason for focusing on efficiency and scalability is the fact that the service-centered infrastructure is a common "vision" in the GIS community. Indeed, this new model for delivering GIS is a frequent discussion topic in today's GIS literature (refer to References) and is being
advocated by the Web Mapping group of the Open GIS Consortium [78]. This is a result of an
increasing interest in web mapping, a growing acceptance of the Application Service Provider
(ASP) model and the availability of technologies that support such an infrastructure. Several working examples do currently exist and are successfully being used. In this section, we introduce two
such services.
13
- The MITOrtho Server Technology
The MITOrtho Server (http://ortho.mit.edu) technology is a product of several years of
research at the Computer Resource Laboratory at MIT. Through an easy-to-use interface, users
can extract the orthophotosi at the resolution and viewport they need. The interface (which is
currently compliant with the Web Mapping Testbed specification) can be used in a variety of
applications (including web browsers and GIS packages) to produce customized snippets of
orthophotos from multiple Gigabytes of images archived on the server. Other compliant map
servers can be found at http://www.opengis.org.
* The Etak EZ-Locate Service
The Etak EZ-Locate Service (http://www.etak.com) is an online service that provides real
time access to the Etak Geocoding service over the Internet. The technology consists of an
address matching engine, which returns, using a currently proprietary interface, the most accurate match(es) for street address(es) requested by the user or the application.
- The Microsoft TerraService
The Microsoft TerraService (http://terraservermicrosoft.com) is a programmable interface
to TerraServer's online database of high resolution USGS aerial imagery and scanned USGC
topographical maps. The service uses the Microsoft .Net framework to provide users with
methods for performing standard queries against the TerraServer databases.
- The MapQuest LinkFree Service
The MapQuest LinkFree service (http://www.mapquest.com) allows users to create maps
for any address they choose. In the form of a URL, users can specify parameters such as the
address to be mapped and the desired dimensions, style and zoom level of the map.
These examples are introduced here to show that the construction of the cited infrastructure
has actually begun, and with the emergence of more of these services by independent parties, its
growth is becoming evident. For this reason, the thesis emphasizes the importance of addressing
scalability and interoperability issues at this stage, before more of these independently-provided
services become available, making it harder to find common grounds for standardization and
acceptance. Without standardization, the sustainability of the infrastructure in the long run will be
jeopardized.
In the next section, an example scenario is presented. Although seemingly simple, the scenario
shows that the complexity of the options explodes a lot sooner than expected.
1. Orthoimages are raster images that have been analytically rectified for tilt and relief, so that every location on the map is in its true geographic position [38]. They are often used as backdrops to vector data to validate accuracy, and as visualization enhancers that allow users to associate their data with the natural
geography of areas of interest, as well as for contour generation and digitization [8]. They are used for a
wide variety of urban and residential infrastructure, transportation, agriculture, forestry, environmental monitoring to list a few [54].
14
1.2.3 The Sources of Complexity
Controlling the rapidly growing complexity of dynamic chaining and coordination of services
in the distributed infrastructure, without compromising performance and scalability, is the main
motivation of this thesis. Chaining occurs when a request sent by a client cannot be provided by a
single service, but rather by combining or pipelining results from several complementary services.
The thesis examines the alternatives for efficient and practical options for the dialogs that occur
among the individual services, and between the services and the calling client.
To give the reader a feel for this complexity, consider the following example which uses the
MITOrtho and Etak servers mentioned in Section 1.2.2. In this simple example, assume that the
client application's objective is to retrieve an orthoimage that is centered around a variable address
supplied by the user. In order to achieve that objective, the client would, either directly or indirectly, use the Etak service for matching the address with a geographic location, and subsequently,
the MITOrtho service to get the corresponding image.
This seemingly simple example is complicated by several issues.
- Dialog Issues: How should the dialog between the three parties in this example evolve?
Should the client coordinate between the requests to the two services by first sending the address
to the Etak service, picking the desired location from a list of possible results returned by the service, and then sending the location to the MITOrtho service (see Figure 1.2)? Or should the two
services somehow communicate, by having the client send the address directly to the MITOrtho
service, which, transparently to the user, would use the Etak service to match the address with a
geographic location that can then be used to get the appropriate image (see Figure 1.3)? Or better
yet, should there be a third service responsible for coordinating between these two services, and
minimizing the amount of data transferred between services?
- Coordinate Systems Issues: What if the location returned by the Etak service is in a coordinate system that is not supported by the MITOrtho service? Should the latter service return the
next best thing to the client, perhaps a locally reprojected image? Or should the MITOrtho service
automatically make use of another re-projection service, capable of transforming the image from
the local coordinate system to the one requested by the user? If so, should the user know about the
image re-projection (in which case, its quality might be compromised)? Or should the user have
the option to specify a re-projection service to be used as well as various re-projection parameters?
15
1. Client sends address to
address matching service
Etak Address
Watching Service
2. Address matching service returns geographic
coordinates corresponding to received address
3. Client sends the coordinates along with other
parameters such as image size, resolution and
M
format to MITOrtho service
Client
MITOrtho
Service
4. MITOrtho service processes request and
returns image centered at desired location
Figure 1.2 Setup option 1: Client coordinates among needed services.
Etak Address
Matching Service
3. Etak service sends
geographic coordinates
corresponding to address
2. MITOrtho service, transparently to the client,
sends the address to the Etak service to get the
matching coordinates
1. Client sends the request for an image
centered at th~e
gi'ven address
Client
MITOrtho
Service
4. MITOrtho service delivers image centered
at address
Figure 1.3 Setup option 2: Services chain, transparently to the user.
* Metadata Tracking Issues: In the case where the MITOrtho service employs a re-projection
service, or in the case where the MITOrtho service itself, assembles the images from various other
services, how should the chain of requests leading to the final image be relayed back to the client
(Figure 1.4)? How would the user know where the data came from, and how/where it was transformed en route? How could the user be given control over the path of the data, if the user desires
so?
- Caching, Semantics and Authentication Issues: What if the MITOrtho service updates its
images on a daily basis? What happens when the user makes the same request tomorrow as the one
made today? Should the service send the latest data by default, or should there be a way, through
some dialog between the service and the client, for the service to return to the client the same
image that was requested earlier? Where should the state of the client be maintained? How to
incorporate into the design cases when the same service might return a different result (perhaps the
same image with different resolutions, perhaps the same image but at a different price) depending
16
on the requesting client, as authenticated?
MITOrtho service assembles
orthoimage from several imagery
services
Reprojection service reprojects
the image from one coordinate
system to another one
Imagery
Poider Servic
Provider
Service
Imagery
rviderSi
Overlay service overlays the input
image and the vector data and
sends the overlay to the client
MITOrtho
Servic
Imagery
Provider Service
Reprojection
Vector Data
Provider Service
Vector data provider service
returns a certain layer at the
extent specified
Figure 1.4 Service chaining and metadata tracking.
Service Management Issues: Suppose there are several ITOrtho-like services available
on the infrastructure, what is the best method for finding services needed by the client, that meet
certain criteria (performance, price, quality, location, etc.) (see Table 1.1)? What type of coordinating elements can be introduced to the infrastructure to facilitate this task without compromising
scalability or performance? What is the role of catalogs and how do they integrate with the rest of
the services?
Criteria
Example
Best Quality or Closest Matching Data
Get an image that has not undergone transformations that
might have degraded its quality.
Fastest Response
Use an approximation for projecting the image to the desired
coordinate system in order to minimize processing time.
Most Reliable Sources
Use data stored by a government entity or a certified organization.
Least Expensive Transactions
Use free or demo services instead of subscribing to fully
functional ones.
Least number of Hops/Chains
Use the minimum number of links to get the desired data.
Table 1.1 : Example criteria for querying services in catalogs.
17
This subset of issues illustrates the complexity of dialogs among services and clients even in
simple cases. Resolving these issues calls for a careful choice of design options for the infrastructure, with scalability (in terms of number of users, services and interactions), extensibility (in
terms of accommodating newer technologies and services), and interoperability in mind. This thesis addresses a subset of these issues as explained in the next section.
1.3 Objectives and Contributions
In this thesis, we analyze some of the issues pertaining to the design of the service-centered infrastructure described thus far, with an emphasis on the scalability and sustainability of the infrastructure, and the amount of technology efforts required to support it, and how they fit into a broader
view of interoperability. In particular, the thesis focuses on the aspects of the infrastructure related
to the dynamic chaining of services, the traceability of metadata and services in transactions, and
the introduction of middleware type services for mediation and management.
Given that the services can be combined in ways that are not pre-defined by their providers,
and that only minimal assumptions can be made about the clients, the emphasis is on interoperable
interfaces of the services, and efficient dialog structures among them. In the process of sorting out
through technological choices and trade-offs, efficiency and performance are the primary constraints, especially in terms of minimizing the bandwidth usage, maintaining the thinness of the
clients and the flexibility of the services, and back-tracing of metadata and intermediate data transformations.
The objective of meeting the scalability, extensibility and efficiency constraints in today's rapidly evolving technology environment is the key challenge of the thesis. Although today's technologies provide the basis for building the service-centered geoprocessing infrastructure, their rate of
innovation and change makes it more challenging to find a practical and long lasting framework
for building a sustainable infrastructure.
Given these challenges and the uncertainty of the future, it is important to note that this thesis
does not attempt to provide a definite answer or recommend a specific solution. The contribution
of this work comes from taking a rather broader perspective, and helping identify and prioritize the
issues that are believed to be the most likely critical to address, for the success of the geoprocessing infrastructure in this environment. As for technologies needed to support the infrastructure, the
objective of the thesis is not to develop new technologies for that purpose, but rather to harness the
potential of current and emerging mainstream IT technologies and trying to apply them in the con-
18
text of a specialized GIS infrastructure. Indeed, one of the goals of the thesis is to isolate the issues
and constraints that are specific to a GIS infrastructure and the nature of GIS processing.
This research can be considered as complementing the current interoperability efforts within
the OpenGIS Web Mapping group [78], whose current focus is on determining standard interface
specifications for web mapping. With the group running via a consensus-based process, and consisting of organizations struggling with their own legacy issues, it is not surprising that, in spite of
the group's rapid progress, the issues related to the chaining complexities of the infrastructure have
only recently begun to surface. In this light, this thesis is timely since it can build on the efforts of
the Web Mapping group while moving beyond the simple interfaces to study the dynamic interactions among the services. The findings will hence assist in anticipating the next round of issues
related to web mapping as defined by the Web Mapping group. The next section describes the
methodology followed in identifying these issues.
1.4 Research Methodology
In order to identify and present, in a structured manner, the fundamental issues addressed by this
thesis, the research is approached from three different perspectives, as depicted in Figure 1.5. Each
perspective contributes to the final recommendations by either complementing or reinforcing findings from the other two perspectives. This approach allows us to easily highlight crucial issues and
focus the discussion around them.
1. Looking at existing
technologies and efforts
Framework of Issues
3. Identifying basic
elements and setups
and Choices
2. Learning by doing: A
prototyping experiment
Figure 1.5 The three perspectives of the research methodology.
We start by looking at the dynamics of existing technologies and efforts in order to draw basic
lessons from similar settings, and consequently assemble a preliminary set of essential issues and
choices. In light of these issues, we then move to implement a prototype experiment which sheds
more light on the complexities of interoperable service design as well as the feasibility of service
chaining in practice. Finally, using a simple yet non-trivial example derived from the prototyping
19
experiment, we address additional issues in the context of investigating alternative architectural
setups.
1.4.1 Looking at Existing Technologies and Efforts
The first stage of this thesis consists of assembling a preliminary set of issues, building the
vocabulary to use, and positioning this research with respect to other efforts in the GIS field.
Towards that goal, we analyze the forces driving the demand for a distributed geo-processing
infrastructure. This step is followed by a brief study of the interoperability and standards literature,
to give an overview of the general issues. These issues are complemented by lessons learned from
looking at two contemporary examples of distributed service-oriented infrastructures: the Internet
and the growing field of Application Service Providers (ASP) and the Internet.
1.4.2 Learning by Doing: A Prototyping Experiment
Prototyping is used in this thesis to get a deeper understanding of the constraints and requirements of designing interoperable services and chaining them. In this case, the prototyping effort
centers around the design and development of a geo-referenced raster image re-projection service.
The reasons for selecting this task for the prototyping experiment are explained later. This prototyping experience helped us identify some issues that, while not necessarily unanticipated, were
hard to conceptualize without trying them in an operational setting.
1.4.3 Identifying Basic Architectural Elements and Setups
By this stage, we are equipped with a deep understanding of the problem and a rich set of preliminary issues. We use this knowledge to focus the discussion through a carefully selected example, which despite its simplicity, provides the right level of complexity for studying service
chaining. The example is used to identify candidate architectures and study, in greater detail, service chaining and its repercussions on the use and scalability of each architecture. At this stage, the
thesis will be often borrowing and adapting choices and configurations from other IT areas such as
distributed databases, search engine technologies, JINI technology, W3C protocols, etc. By incorporating such emerging mainstream technologies and standards into our analysis, we hope to
arrive at solutions that can be easily integrated with the rest of IT, hence lessening the isolation of
GIS.
20
1.5 Thesis Organization
The organization of this document follows directly from the research methodology described
above. Chapter 2 presents the results of our attempts to learn from relevant technologies and
efforts. It includes the background research that we feel is needed to support the development of
this thesis. In Chapter 3, we describe our experience with building a prototype of a re-projection
service for geo-referenced raster imagery. We use this prototyping experience in Chapter 4 to
refine a framework for identifying the requirements for a scalable distributed infrastructure, for
describing the operations of some of its basic elements, and exposing the first set of general issues
that arise. This framework is applied within the context of a carefully selected example to explore
architectural alternatives for the distributed geo-processing infrastructure. Finally, Chapter 5 presents a synthesis of the issues explored in the thesis, and their relationship with other ongoing
efforts. We also conclude with a speculation on the future of the GIS field baring a successful distributed infrastructure, and present recommendations for further studies.
21
Chapter 2
Background
The purpose of this chapter is to assemble a preliminary set of issues relevant to our discussion
of a distributed geo-processing infrastructure. The chapter outlines our background research on
related efforts and technological trends in the GIS and IT fields. It also positions this work and its
contribution relative to those efforts presented. Table 2.1 provides an overview of the topics covered in this chapter and their relevance to our discussion.
Topic
Reason for Inclusion
2.1 The Evolution of GIS
Needed to understand the driving forces behind the need for
a distributed geo-processing infrastructure, and hence the
major issues, requirements and constraints pertaining to
building it.
2.2 Interoperability and Standards
Identified as being key to the success of a scalable distributed infrastructure.
2.3 Ongoing Standardization Efforts
Needed to relate the thesis content with ongoing formal
efforts in the area of standardizing and distributing GIS.
2.4 The Internet
Recognized as a successful case of a distributed system
environment. Discussion focuses on the success factors and
the possibility of applying them to the GIS case.
2.5 Application Service Providers (ASP)
Introduced as a new model for delivering IT solutions to
organizations. Discussion focuses on the trends and issues
facing ASP and their similarities with those explored in the
thesis.
Table 2.1 : The background at a glance.
22
2.1 The Evolution of GIS
In preparation for assembling the principal issues pertaining to the distributed geo-processing
infrastructure, we develop an understanding of the key factors and technologies that have influenced and enabled its evolution. In this section, we briefly present this understanding of the stages
along with some examples of key factors and technologies
2.1.1 Legacy GIS Systems
Originally, Geographic Information Systems were developed independently by software vendors, who tailored their applications for their specific user needs, using locally created terminologies and approaches [47]. Constrained by the incapability of regular databases to accommodate
geodata requirements, GIS vendors typically resorted to creating their own data structures that
operated within proprietary file management systems [70]. On one hand, this separation of geodata
from traditional databases led to the isolation of GIS technology. On the other hand, the complexity of these systems has confined the number of users to a few GIS professionals.
With GIS data tightly integrated into the systems used to create it, and gathered according to
different resolutions and scales [70], it became more difficult to share and re-use this data across
departments and disciplines. This sharing and re-use problem is particularly hindering given geodata's potential for re-usability for independent needs and applications, its added value in spatial
analysis, and its associated high reproduction costs [35].
However, with the continuous emergence of new technologies and a growing investment in
interoperability, the GIS technology has gradually transformed, as discussed next.
2.1.2 Influences of Emerging Technologies
The rapid developments in Information Technologies over the last two decades have accelerated the expansion of GIS in terms of both user and application bases (see Figure 2.1).
For example, the declining cost of hardware, software and data over the years triggered the
introduction of desktop mapping packages (such as ArcView and MapInfo). At the same time, GIS
vendors attempted to increase their market penetration by focusing on simplifying the process of
using a GIS. Moreover, the growth of the Internet and the accompanying advances in communications, middleware and component-based technologies led to a boom in web mapping and a new
breed of GIS products (such as MapObjects) providing embeddable GIS components [76].
23
- Wireless and Mobile
computing
- XML & ERPs
- Internet (html, Java, etc.)
- Client/Server
- Middleware (CORBA, etc.)
- Lower costs of PC
- Faster Processing
Stand-alone GIS systems
(workstations)
Desktop Mapping
Databases with
Unbundled GIS
Spatial Extensions,
SDE
into Autonomous
Services
Embeddable GIS
Components
Data tightly integrated
with GIS system
Distributed and
Web Mapping
Interoperable GIS
services
Expanding GIS user and application bases
Figure 2.1 Evolution of GIS in response to enabling technological advances.
Similarly, the move towards client/server applications and the growth of data warehousing led
to the surfacing of products such as the Spatial Database Engine, as well as the emergence of spatially enabled databases such as Oracle Spatial, Informix Universal Server (Datablade) and Sybase
Adaptive Server Enterprise [70].
This trend of technology influences currently continues with what we consider to be four key
drivers (portrayed in Figure 2.2) to unbundle and distribute GIS, namely
* The
- The
- The
* The
growing availability of spatial data, as discussed in Section 2.1.3
proliferation of the Internet, as discussed in Section 2.1.4
new role of GIS in today's enterprises, as discussed in Section 2.1.5
advances in mobile and wireless technologies, as discussed in Section 2.1.6.
24
Ii
Internet
* Web mapping
" Push for interoperability
Availability of Spatial Data
9 Satellite imagery
Variety of suppliers and
distributors
*
Push towards an infrastructure
of distributed autonomous
interoperable GIS services
Mobile & Wireless Techs
Role of GIS in Enterprises
*
Location-based
wireless services
* Services for thin clients
with bandwidth constraints
Integration of databases
Push for components
J
Figure 2.2 Key drivers for unbundling and distributing GIS.
2.1.3 Availability of Spatial Data
In the last decade, the rate of GIS data collection (especially of the raster type) has significantly increased, due to advances in technologies such as high resolution satellite imaging systems
and GPS. Indeed, many firms (both national and international) are entering the high-resolution satellite imagery business. For instance, in 1999 and 2000, three US commercial companies (Space
Imaging, OrbImage, and Earth Watch) have been scheduled to launch satellite imagery systems
with one-meter panchromatic and four-meter multi-spectral resolutions [40]. More data is also
expected to be soon available from NASA, which, by virtue of the recent communication satellite
competition and privatization act, will be allowed to sell some of its images and purchase from
other trusted commercial sources [54].
With the expected abundance of imagery data, many companies are eager to create imagery
distribution channels, many using the Internet. MapFactory (http://www.mapfactory.com), for
example, is in the process of creating the largest geo-referenced digital imagery archive on the
web.
Microsoft,
on
the
other
hand,
has
established
the
TerraServer
(http://terraser-
ver.microsoft.com), which contains USGS photographs of more than 30% of the US as well as satellite imagery of other parts of the globe.
As more non-traditional customers recognize the potential uses of geo-imagery in their businesses and as more data providers become available, the need for a scalable infrastructure of geoservices becomes increasingly pressing. Given the collective size of this data and the diversity of
its potential uses, having central repositories of this data accessible within such an infrastructure
offers many benefits to users and developers of GIS applications. On one hand, the distributed
25
solution spares users and developers the burden of maintaining the images, and on the other hand it
frees them to focus their efforts on developing tools to effectively use and integrate this data.
It is evident from the discussion so far that the Internet has played an important role in the race
for distributing the collected imagery data as well as other GIS data, as discussed next.
2.1.4 Impact of the Internet
The Internet has had a tremendous impact on information technologies and businesses. In the
case of GIS, this relatively accessible and reliable communication infrastructure has accelerated
the interest in sharing spatial data [25]. Indeed, the Internet can be credited with pushing GIS to the
forefront by providing a platform for a growing number of web mapping sites and applications.
Despite the increasing popularity of using the Internet to distribute and browse spatial data, it
remains difficult to integrate data returned by various web mapping sites. For this to happen, data
exchange and application interface standards are needed to access, view and integrate the available
data in a consistent fashion [83]. In this sense, the Internet can also be seen as a major driver for
the interest in GIS interoperability and standards. This interest is shared by GIS and database vendors who also face difficulties of integrating GIS data with other databases and systems, as discussed in the next section.
2.1.5 Emerging Role of GIS in Today's Enterprises
Over the last decade, enterprises have discovered that greater knowledge and efficiencies can
be achieved by integrating systems and databases [36]. As competitive pressures for growth intensify and accordingly Enterprise Resource Planning (ERP) systems gain momentum in today's
organizations, several departments (like marketing, finance and customer service departments) are
seeking new ways to enhance their operations through geospatial analysis. Spatial data is increasingly recognized as a corporate enterprise asset, which can provide the basis for more effective
activity planning and development for the enterprise [97]. With about 80% of all data in the
world's databases containing a spatial element or reference (such as zip codes, customer addresses,
warehouse locations, etc.) [18], there has been increasing gravitation towards integrating GIS databases into the organizations' ERP systems. Thus, both organizations and ERP systems providers
are nowadays looking for ways to incorporate, into their systems, flexible mapping components
that can be customized with little difficulty or training. This interest in unbundled GIS components
in turn strengthens the need for a service-centered model of GIS.
26
2.1.6 Mobile and Wireless Technologies
A carefully designed distributed service-centered model can also play a critical role in the
emerging mobile services industry. Advances in microprocessor-embedded GPS circuits, handheld
computing and wireless networking have led to a new generation of PDAs (such as Palm Pilots,
Windows CE, and Visor devices) that can incorporate the functions of a GPS receiver [75], and
hence can provide a variety of location-based services to the customer. The interest in these services is high because their revenues are projected to rise from less than 30 million dollars in 1999
to 3.9 billion dollars in 2004, according to a study performed by the Strategis Group, a Washington
DC consulting firm that specializes in wireless markets [64]. The dependence of these locationbased services on GIS presents several practical challenges in terms of bandwidth, latency and
online connectivity. As argued in Section 1.2.1, a distributed geo-processing infrastructure can
greatly serve this market segment by facilitating access and integration of data sources and services.
In the context where the infrastructure is to accommodate the aforementioned GIS, ERP and
wireless markets and their diverse users and constraints, while still meeting scalability requirements, interoperability of data and services is critical. Interoperability can be achieved in many
ways and involves its own set of interesting challenges and trade-offs, as covered in the next section.
2.2 Interoperability and Standards
Interoperability and standards play an important role in the information era. From bus structures to
qwerty keyboards, IT standards are viewed as both drivers and reflections of the growth and maturity of the IT industry [19]. In the case of the distributed GIS infrastructure, interoperability and
standards are essential considering that the success of the infrastructure is dependent on the ease
with which users can access, interchange and freely mix-and-match independently-provided services.
This section provides an overview of interoperability, standards and general standardization
processes. It is intended to familiarize us with the issues that we are likely to face when discussing
alternative architectures for a distributed geo-processing infrastructure.
27
2.2.1 Interoperability
The need for interoperability in IT comes as a result of the existence of multiple heterogeneous
systems within organizations, the desire to transparently share information and processes among
these systems, and the need to combine autonomous components for the provision of larger applications [59]. Today, interoperability is widely recognized as a new paradigm for linking these heterogeneous systems and facilitating a more efficient use of the computing resources within and
among organizations. The next sections present common definitions of interoperability, its advantages and some commonly recognized issues.
Interoperability: Definition
Given our extensive use of the term "interoperability" in this thesis, we start with a definition
of the word. The IT literature shows that there are many coexisting and complementary definitions
of interoperability. To begin with, interoperability is a key characteristic of an open system, which
is defined by the IETF (Internet Engineering Task Force) [90] as
a system that implements sufficient open specificationsfor interfaces, services, and supporting formats to enable properly engineered applicationssoftware to be ported across a wide
range of systems with minimal changes, to interoperatewith other applicationson local and
remote systems, and to interactwith users in a style thatfacilitates userportability.
In other words, interoperability is a property of multi-vendor, multi-platform systems (or sub-systems) that allows them to interact with each other through the interchange of data and functions
[67]. Litwin [72] stresses the autonomy of these systems with the following definition
Interoperability refers to a bottom-up integration of pre-existing systems and applications
that were not intended to be integrated but are systematically combined to addressproblems
that require such an integration.
Interoperability is hence a property of vendor-independent systems or components that enables
them to transparently access, interchange, and integrate data and services among each other.
Interoperability: Advantages
According to the literature, interoperability provides technology users with
1. the freedom to mix and match components of information systems without compromising
overall success [66],
2. the ability to exchange data freely between systems [47], and the ability to integrate
information [99],
3. the ability to request and receive services between inter-operatingsystems and use each
other'sfunctionalities [36].
By facilitating information access, integration and sharing in addition to inducing competition
among providers [71], interoperability leads to broadened product acquisition opportunities [88],
28
reduced project costs, faster project life-cycles and flatter learning curves for new systems [47]. It
also provides a more flexible computing environment for developers, as it facilitates application
development, management and maintenance and frees developers to focus on adding value in their
areas of expertise [59]. In the specific case of GIS, interoperability advantages also include overcoming tedious batch conversion tasks, import/export obstacles, and distributed resource access
barriers imposed by the heterogeneous GIS processing environments and data [18].
Interoperability: Issues
The interoperability literature suggests that issues related to interoperability can be studied at
different levels of abstraction [46], as shown in Table 2.2.
Level of Abstraction
Issues
Technical
Format compatibilities, removal of detail of implementation, development
of interfaces and standards.
Semantic
Domain-specific definitions and sharing of meaning. This layer is more
problematic than the technical one.
Institutional
Influence of organizational forces on the success of interoperability solutions [34]: willingness of organizations to cooperate, as well as behavioral,
economic and legal factors that affect the participation of parties.
Table 2.2 : The three abstraction levels of interoperability issues.
Depending on the problem at hand, the prioritization of these issues highly determines the
approach followed towards achieving interoperability in that setting. As each problem is unique in
its combination of requirements and constraints, it follows that there is no unique "right" approach
to achieving universal interoperability. All approaches however share the challenge of satisfying
current requirements while simultaneously being able to easily adapt to evolving user needs and
technological changes [56], as emphasized in the next section about standards.
2.2.2 Standards
Standards represent "the deliberate acceptance by an organization or a group of people who
have a common interest, of a quantifiable metric for comparison, that directly or indirectly, influences the behavior and activities of a group by permitting and (possibly) encouraging some sort of
interchange" [19]. Standards exist to enable interoperability, portability and data exchange [71].
This section provides an overview of the basic characteristics that make a "good standard"
highlighting the available choices and their trade-offs.
29
Standards: Basic Characteristics
We analyzed issues pertaining to the characteristics of good standards as portrayed in the literature along five dimensions:
1.
2.
3.
4.
5.
The
The
The
The
The
standardization process
timing of standards
standardization level
scope and extensibility of standards
acceptance of standards
1. The standardization process: According to the literature, standards are either developed
by a recognized standards-developing organization (de jure standards), or as a solution from one
provider that has captured a significant share of the market (defacto standards). Table 2.3 provides
an overview of these two approaches.
De Facto Standard
De Jure Standard
Definition
Product or system from a provider that has
captured a large share of the market, and
that other providers tend to emulate, copy
or use in order to obtain market share [59].
In some cases, the standard reflects the
technical approach of the dominant player
in the market (e.g., Microsoft) [20].
Solution created by a formally recognized
vendor-independent standards-developing
organization such as IEEE or ISO [20]. Such
standards are developed under rules of consensus in an open forum where everyone has
the chance to participate [59].
Examples
GeoTIFF [84], TCP/IP, Postcript.
CSMA/CD (Ethernet), SQL, ISO X.11.
Drawbacks
Lack of backup by an industrial agreement [97].
Lengthy formal process sometimes leads to
solution trailing leading technologies [88].
Discussion
- From the vendors' perspective, competitors each try to establish and advertise
their technologies as the defacto standard
for the field, even if it sometimes means a
"premature commoditization" of their
technology [9]. Although unappealing,
this rush is necessary since by waiting,
vendors risk the defacto becoming a competitor's technology.
- Eventually when market penetration is
high, a public body may review the de
facto technology and accept it as a standard (e.g., TCP/IP, Xerox Ethernet) [88].
- In the 1960s and 1970s, the abuse of de
facto standards has lead to the creation of
more formal standards programs that
promised to free users from uncomfortable vendor dependencies [59].
- Trying to design anticipatory standards or
future technologies is a lengthy process
since details can be argued for years [19].
This process involves a considerable
amount of negotiation between parties that
often have their own interests to pursue.
- Frequently impeded by a wait-and-see policy whereby vendors wait until there's a
broad-enough support for the standard
before implementing it. For example, while
the ISO X.400 standard was slowly being
adopted by the industry, SMTP had already
become the de facto standard for e-mail
[20].
- In some cases, vendors augment the standards through unique additions that add
value to the users of their proprietary technologies [9]. E.g., the "flavors" of html
available as a result of the competition
between Microsoft and Netscape.
Table 2.3 : De Facto versus De Jure standards.
30
2. The timing of standards: Correct timing of the development and release of a standard has
a major influence on its success and acceptance. As shown in Table 2.4, standards tend to lose
some effectiveness when they are released too early or too late.
Early Standardization
Late Standardization
± ______________________________________________________________________________
- If a solution occurs too soon (e.g., Graphic Kernel
System [97]), there is the risk of delaying searches
for better solutions, distracting innovation in the
field [33], or freezing technologies to the point of
making them obsolete [19].
- An early solution often leads to a premature commitment to detail when it might be too early to distinguish between the nice-to-have from the needto-have features [71].
- If a solution comes out too late (e.g., Asynchronous Transfer Mode [97]), the acceptance of the
standard is usually delayed as users are reluctant to
use the standard right away, having already implemented solutions of their own [97]. This also leads
to a confusion in the market in terms of coping
with the various implemented solutions [71].
- Late standardization is either a result of a lengthy
development and consensus process [33], or a
result of a community waiting for the "best" solution. In either case, the technology misses its market [71].
Table 2.4: Timing of standards.
3. The standardization level: Depending on the problem, standardization can occur at any
one of many technical levels, as shown in Table 2.5.
Standardization
Level
Description
Data (or data
transfer)
standard
- Heterogeneous systems can freely exchange data by either having each system
understand the other systems' formats or by following a data or interchange standard [47]. E.g, html as a document transfer standard for the Internet.
- Complex translation methods between data formats (mostly performed as off-line
batch-oriented processes) are often lossy [34], and result in duplication and redundancy of data, not to mention the big pool of needed translators [44].
- It's usually hard for an industry to move to a single data standard. Hence, standards at this level tend to take more time to develop and be adopted than other standards [22].
Interface
standard for data
access and processing
- This implies the adherence to a single common interface through which heterogeneous resources and services can be accessed [100]. The focus here is on protocol
interfaces that allow access to data and operations regardless of underlying implementations. E.g., TCP/IP.
- Being at a higher level of abstraction, the interface standard is vendor-independent
(in terms of data format) and hence is not tied to a proprietary solution [59]. An
interface standard is often coupled with a definition of an abstract data model,
defining a meaningful domain-dependent subset of data types needed.
Metadata standard for data
descriptions
- This involves defining standards for structured data descriptions that allow users
to consistently query data sources about their underlying properties. Such a standard is often used when it is unlikely to converge to a certain data standard in a particular field [34]. E.g., DIF (Data Interchange Format), satellite imagery metadata
defined by NASA data centers [97].
Table 2.5 : Levels of standardization.
31
4. The extensibility and scope of the standard: Extensibility and scope are probably the
most critical attributes characterizing a standard in addition to being the most challenging to realize and calibrate. They both impact the success and use of a standard in the long term.
A certain degree of extensibility is desired in a standard to avoid the risk of having both the
standard and the technologies using it become rapidly obsolete. However, incorporating extensibility automatically imposes the challenge of factoring-in and accommodating (today) the possible
requirements and technologies of tomorrow[59].
As for the scope of a standard, it is usually determined by a balance between the number or
breadth of functionalities needed and their depth [71]. Depending on the situation, depth is sometimes compromised to gain a wider acceptance and ease of use. As an example, consider the html
language: Its design as a simple and easy-to-use document transfer standard for the web gave it
wide acceptance but considerably limited its functionality. This limitation has pushed for the need
for more sophisticated standards at the abstract task building and data tagging levels, such as Java
and XML.
5. The acceptance of the standard: Acceptance or conformance to a standard by a critical
mass is key. Simply stated, the true test of a standard is in how well it is accepted and how well it
meets the evolving needs of both vendors and users. This of course depends on the quality and timing of the standard, as well as on the availability of incentives offered to encourage its use [96]. In
summary,
A "good" standard does not only have to be consistent, expressive, simple, comprehensive and extensible, it also needs to be available at the right time and be
accepted by users and vendors [98].
Achieving this objective naturally requires many compromises as the listed characteristics are
often conflicting. For instance, expressiveness and consistency often can only be achieved at the
expense of simplicity. The same applies to the completeness of a standard versus its extensibility.
Striking the right balance among these characteristics is a major challenge in the process of defining a standard, in addition to the difficulties associated with creating specifications of high technical quality, determining the functionalities to be included, and accommodating most parties'
interests [59]. In the Internet age, these challenges are even further accentuated as the environment's global reach and requirements can only make it harder to agree on a common denominator,
while the rapid pace of today's technological advances adds to the timing and extensibility challenges.
We conclude from the previous discussion that it will be almost impossible to find a "one size
fits all" standard. Accordingly, we expect that each setting will be unique. Given the essential role
standards have played in GIS, it is not surprising that there are ongoing GIS-related interoperability and standardization efforts. In the next section, we look at some ongoing efforts in order to help
better position this study (and later its findings) with respect to these efforts.
32
2.3 Ongoing Standardization Efforts
Interoperability and standardization are increasingly being considered as hot topics on the GIS
agenda. This section presents a subset of the efforts currently underway to solve parts of the GIS
interoperability and standardization problem. The subset selected is used to illustrate the various
levels at which solutions are being proposed. Naturally, these efforts are dependent on other IT
standardization efforts (not covered here), such as those related to databases, network protocols
and data formats.
2.3.1 Early standardization efforts
Early GIS standardization efforts attempted to solve the problem of data sharing and distribution by standardizing data exchange formats. Some efforts focused on the importing and exporting
data through common formats such as the TIGER format, while other efforts involved introducing
vendor-independent formats such as the DLG (Digital Line Graph), DRG (Digital Raster Graph),
DEM (Digital Elevation Model), DOQ (Digital Orthophoto Quadrangle) formats developed by
USGS [97]. Although successful, the usefulness of these formats is limited to data exchange at the
file level only. Furthermore, these latter approaches were heavily criticized for not representing the
conceptual data model of the data, in terms of useful object-attribute relationships, spatial reference systems and other metadata elements [5].
2.3.2 Spatial Data Transfer Standard (SDTS)
The Spatial Data Transfer Standard was the answer proposed to finding a more complete solution on a federal level. SDTS is a vendor-independent intermediary data exchange format following the general purpose data exchange format defined in ISO 8211 [97]. As such, it was carefully
designed to represent the varying levels of data abstraction perceived as necessary for a meaningful data exchange. The work on SDTS was initiated in 1982 by the National Committee for Digital
Cartographic Data Standards (NCDCDS) consisting of interested parties from the private industry,
governmental agencies and academic institutions. In the late 1980's, SDTS was created after the
merging of the NCDCDS and the Federal Interagency Coordinating Committee on Digital Cartography. In 1992, SDTS was accepted as a Federal Information Processing Standard (FIPS 173).
Currently, the Federal Geographic Data Committee (FGDC) 1 , established in 1990 by the US
1. http://www.fgdc.gov
33
Office of Management and Budget for promoting and coordinating national digital mapping activities, is the organization responsible overseeing the approval process of SDTS [5].
Despite its appeal and ability to serve a fundamental need, SDTS did not receive the big market acceptance that was hoped for at the time of its design. Unfortunately, the attempt to make
SDTS as comprehensive as possible also made it very complicated, hence hindering its wide use
and acceptance [95]. It also lacked proper supporting educational material, thus requiring significant investment to build and maintain the expertise and software within an organization using it.
Additionally, SDTS exhibited extensibility problems when it was not able to effectively respond to
un-anticipated requirements such as support for value-added extensions by users, and harmonization of metadata content with emerging international standards [5]. Most of all, SDTS occurred
somewhat too late. By the time it was introduced, vendor standards were already in wide use satisfying considerably large groups of users.
Since 1994, SDTS has experienced increased interest as a result of a presidential order to start
the National Spatial Data Infrastructure, mandating federal governmental agencies producing spatial data to adopt a universal standard for facilitating data transfer and reuse [95]. The establishment of the Open GIS Consortium has also had a similar effect on the interest in SDTS [74].
2.3.3 National Spatial Data Infrastructure (NSDI)
The NSDI is "an umbrella of policies, standards, and procedures under which organizations
and technologies interact to foster more efficient use, management and production of geospatial
data. It includes the technology, policies, standards, and human resources necessary to acquire,
process, store, distribute and improve utilization of spatial data" (Executive Office of the President, 1994). The infrastructure was created to address the needs of federal agencies faced with
lower budgets and stronger demand for better quality geographic information. It was also a
response to organize and share the increasing amounts of spatial data independently collected by
these agencies (which have benefited from cheaper technologies and advances in digital technology) [76].
NSDI was first initiated in 1993 by the Mapping Science Committee of the National Research
Council. In 1994, president Clinton issued Executive Order 12906 "Coordinating Data and Acquisition Access: The National Spatial Data Infrastructure", which initiated the execution of the
implementation of the NSDI, and assigned FGDC the task of leading its development [96]. The
goals of the NSDI involved creating:
1. The National Geospatial Data Clearinghouse, a decentralized network of Internet sites that
provides access to metadata of available geographic information [34]. FGDC adopted the Z39.50
34
standard' and developed a metadata content standard for searching and retrieving information
from the clearinghouse.
2. The National Digital Geospatial Data Framework, which consists of a set of commonly used
layers of geographic information, and defines approaches for sharing data production responsibilities.
3. Data sharing standards, such as the Content Standard for Geospatial Metadata and the
SDTS.
Although it is a great first step towards a unified infrastructure for geographic information,
some consider the NSDI to be vulnerable to political and funding conflicts, by virtue of being lead
by a government agency. Critics are also frustrated by its slow progress [53]. Increasingly, the
NSDI has been criticized for focusing on the production and management of spatial data instead of
the development of more effective data dissemination processes, especially in light of the growing
information disseminating role of the Internet. Given that it was initiated in 1993/1994 when the
web had not yet taken off, the NSDI did not anticipate, let alone take advantage of the impact and
potential of this new medium, where it becomes more critical to find efficient ways for identifying
and locating the needed distributed geographic information [76].
2.3.4 Open GIS Consortium (OGC)
The Open GIS Consortium is a consensus-based association founded in 1994 to address the
interoperability problem in current GIS systems, and to create GIS standards that follow mainstream IT trends. The consortium's current membership count is over 200 organizations, including
major GIS vendors, computer companies, system integrators, academic institutions as well as various public organizations. As such, the organization is an interesting case of the "coopetition"
model, where competitors collaborate in creating value through interoperability and compete in
dividing the user base and services [14]. Coopetition is known to yield a faster overall market
growth as it bypasses prolonged periods of shaking out competing products.
With such a broad support from the community, OGC is currently a major force in the trend of
GIS openness, leading the efforts of creating open interfaces and common tools that enable communication and exchange of data and services among GIS systems. The OGC has been working
primarily on two interface specifications
1. A specification for a common object geodata model which provides a common language for
the geoprocessing community by defining interfaces for geodata modeling.
2. A specification for a common object services model which defines interfaces for common
geodata services needed to work with GIS information in a distributed environment. For more
information about these specifications, refer to the Open GIS guide at http://www.opengis.org.
1. Also adopted by the Canadian Earth Observation Network (CEONet) [69].
35
Of particular interest to us is an active group within the OGC known as the Web Mapping Special Interest Group. Acknowledging the rapid growth of the Internet and its role in geographic data
and services dissemination, the group was formed in June 1997, with the mission of defining services that allow web-based mapping clients to simultaneously access and directly assemble maps
from multiple servers using Internet protocols. Their focus is on "achieving interoperability of
map assembly and delivery middleware within a heterogeneous distributed computing environment" [78].
In the form of organized testbeds, the group provides a forum for web mapping technology
providers and users to determine standard interface specifications that will facilitate web-based
mapping. After the first web mapping testbed (WMT 1), the group has published three server specifications of interest to us:
1. The Map request, sent in the form of a cgi call to the server, takes the geographic bounding
box of an area, its desired picture pixel width and height as specified by a client (among other
parameters), and returns the desired area at the right scale in either a picture or graphic format.
2. The Capabilities request allows users to query a server for its capabilities, its services and
the layers it serves. Appendix B discussed the Map and Capabilities requests in more detail.
3. The FeatureInfo request allows users to ask about information on particular features on a
map retrieved using the Map request.This request is still geared towards the picture case and is less
defined in the graphic element and feature cases.
WMT1 focused on covering the basic process of providing information requested by users in
the form of an overlay of transparent images fetched from multiple map services compliant with
the above specifications. Although suitable for viewing and simple querying of geographic information, such an approach was quickly found not to be fit for more sophisticated uses, where the
raw underlying GIS data is needed. For example, there are cases where users need to have more
control over specific features of the data as well as its display characteristics in terms of styling,
colors, bands among other attributes. In the second phase of the testbed (IP2000), the focus consequently shifted to providing users with more control over the data requested, and giving them the
option of fetching data as features and coverages, as opposed to image snapshots. At the time of
this writing, the IP2000 workgroups are working on specifications for legend retrieval, coverage
and feature interfaces as well as security and other e-commerce services (for more information on
these efforts, we refer the reader to the OGC web page).
The OGC web mapping testbeds provide us with a rich set of examples and specifications that
are directly related to our work. As noted in Section 1.3, we will comment on some of the efforts
of the WMT and the OGC, and present our views on the next round of issues that may be
addressed in their context.
36
Finally, it is important to point out that the OGC efforts presented here are not entirely independent of the other efforts discussed in this chapter. In many ways, all these efforts complement
each other and their respective organizations recognize that they have common basic objectives.
For instance, the FGDC is an active sponsor in the OGC web mapping testbeds. Furthermore, both
the OGC and the FGDC have worked together in committees and workshops related to the Digital
Earth Initiative' [74]. The OGC is also involved in international standards working groups, such as
the ISO/TC 211, and works closely with the ANSI X3L1-2 (a committee that works on spatial
extensions to SQL).
Finally, with areas that are still under development, the task of reaching consensus and the
necessity of being responsive to changes in the environment become even more difficult. Accentuated by the complex nature of GIS and the dramatic change in the conceptualization of GIS systems and components, the consensus-based OGC process can be sometimes lengthy [34].
Moreover, the presence of many competitors in the organization sometimes undermines the process through political compromises as well as contradictory and counterproductive pushes for
technologies that serve individual members' own interests best [69].
2.3.5 World Wide Web Consortium (W3C)
The W3C is a global organization comprised of companies, non-profit organizations, industry
groups and government agencies. Recognizing the many advantages of representing graphic sent
to browsers in vector rather than raster formats, the W3C has been increasingly interested in standardizing the format and structure of vector data. The advantages of vector formats stem from the
considerably smaller size of a vector data representation compared to an equivalent raster one. The
smaller size of some vector data makes it more fit to be transmitted over a network. The vector
data also allows for more interactivity as it retains the object nature of its constituent elements and
scalable information. It is also more easily displayable in a wide range of resolutions without loss
of quality [109]. These points are particularly important if we consider the emerging need to provide GIS services for mobile users, whose data connectivity is typically bandwidth-limited.
The standardization of the vector format will be highly beneficial to the GIS community, since
many common vector formats are proprietary, inhibiting the efficient exchange of this data and
affecting its manipulation and display in a browser. In this context, the role of the W3C is to pro-
1. The Digital Earth Initiative is an ad-hoc inter-agency working group formed to define the US federal participation in
Digital Earth. The latter is defined as a virtual representation of the planet that enables users to explore and interact with
vast amounts of natural and cultural information gathered about the Earth (http://digitalearth.gsfc.nasa.gov). This initiative naturally requires a certain level of interoperability and standardization of spatial data, metadata and services.
37
duce a specification for a scalable vector graphics format, written as a module XML namespace
[109]. XML is favored because it is extensible and text-based, and hence facilitates the tasks of
exchanging and reading vector data.
Towards the goal of standardizing the vector format, the W3C formed the Scalable Vector
Graphics (SVG) working group in August of 1998, to design a common vocabulary (DTD) for
scalable vector data. Several proposals have been submitted for SVG compliance, including
PGML (Precision Graphics Markup Language), WebCGM (Web Computer Graphics Metafile),
and VML (Vector Markup Language) [49].
In a way, these vector standardization efforts complement the efforts of the web mapping
group, whereby the work on the GIS servers' interfaces would not be complete without a standard
for the data returned from these servers.
2.3.6 Other Efforts
Before concluding this section on ongoing GIS interoperability efforts, it is important to highlight some of the international efforts. One such effort is driven by the ISO/TC 211 committee,
which includes members such as the European Petroleum Survey Group (EPSG), the International
Society of Photogrammetry and Remote Sensing (ISPRS), the Japan National Spatial Data Infrastructure Promoting Association, Europe's Joint Research Center and the Open GIS Consortium.
This committee works on emerging international standards for digital geographic information such
as the ISO/TC 211 Geographic Information Geomatics standard. The ISO/TC 211 has however
been criticized for focusing only on data rather than on practical implementation aspects [97].
The Canadian Geospatial Data Infrastructure (CGDI, www.geoconnections.org) is another
independent effort that tackles the problem of accessing heterogeneous and distributed spatial data
in an interoperable fashion [44]. Products developed as a result of this initiative include
1. The Open Geospatial Datastore Interface (OGDI), an API that sits between an application
and a geospatial data product to provide access to that data. The drawback of this method is that a
separate driver needs to be developed for each data format, for the API to be able to call it.
2. The Geographic Library Transfer Protocol (GLTP), a stateful protocol (in contrast to the
stateless protocols developed by OGC) that retains knowledge of all transactions pertaining to a
connection, hence allowing for more efficient processing of successive related queries or transfers
of geospatial data.
Another effort that is worth mentioning in this section involves the .geo proposal made, at the
time of this writing, to the Internet Corporation of Assigned Names and Numbers (IACNN) by
SRI, a california-based company. SRI proposed a location-based top level domain (.geo) which
supports geo-spatially indexing and referencing of data available on the Internet. Briefly, the pro-
38
posal suggests assigning unique domain names for grid cells (of different scales) that would cover
regions of the earth. Each geographic cell is assigned a cell server, which is responsible for storing
and responding to queries for geodata that lie within its cell boundary. Cell servers would be maintained by organizations called geo-registries, which can charge users for registering their data with
them. Such a setup would allow users to search for data by location via geo-enabled search
engines. The details of the proposal are available at http://www.dotgeo.org.
Although intriguing and innovative, the proposal requires many issues to be resolved at both
the technical and the regulatory levels. It was criticized by some for proposing a fee-based registration mechanism, for adding yet another level of complexity to the Internet, and for restricting
access to the public by bureaucratizing geospatial data [91]. Consequently, the proposal was turned
down by ICANN.
Given the utility of searching for data according to its geographical location, we believe that,
despite these issues, the technical and regulatory details will be eventually be worked out, leading
to a consistent scheme of geo-referencing information on the Internet. Such a scheme would be
very useful in the distributed geo-processing infrastructure, especially in the context of some of the
architectural setups discussed in Chapter 4.
In the next two sections, we move to present two distributed models for delivering and accessing services over networks. The goal will be to identify and anticipate, through an understanding
of these models, the issues that are likely to arise in the distributed geo-processing infrastructure.
2.4 The Internet
The Internet is certainly the most talked about computer-related phenomena of the nineties. It is
presented in this thesis as a powerful case of a successful distributed environment. In including a
discussion about the Internet here, we are interested in first extracting the key factors that contributed to its wide proliferation and success at the technological, architectural and organizational levels. Then, we want to identify those factors that are most likely to apply to the case of a distributed
geo-processing infrastructure. In addition, we hope to gain some insight into what makes the distributed geo-processing infrastructure unique.
2.4.1 Identifying Key Success Factors
This section summarizes our understanding of how the Internet, a loosely-organized international collaboration of autonomous, interconnected heterogeneous sub-networks, can provide the
39
illusion of being a single homogeneous network [20], and how this federation of autonomous systems can cooperate for mutual benefit to form such a scalable environment [88]. The key factors
that contributed to the success of this phenomena, and that are relevant to our discussion of a distributed geo-processing infrastructure are:
" Key factor 1: Simplicity and tolerance of underlying standards
- Key factor 2: Consensus for consistency
" Key factor 3: Extensibility and evolution
- Key factor 1: Simplicity and tolerance of underlying standards
Standards constitute the foundations of the Internet. They are the glue that keeps computers
and applications working in harmony in such a global decentralized environment [69]. We argue
that it is the simplicity and the tolerance of these standards that have enabled the rapid growth of
the Internet. Consider, for instance, http, the underlying communication protocol which governs
the order of conversations exchanged between clients and servers [63]. The http protocol was
designed with network efficiency and scalability in mind1 . Its simplicity and tolerance partially
derive from its stateless nature, which considerably simplifies its request/response operations.
Similarly, the Internet's most common document transfer standard, html, played an important role
in the proliferation of the web. Html's rapid adoption was due to its ease of construction given the
availability of various authoring tools, and its ease of use, given the simplicity of browser usage.
In both examples (http, html), compliance to the standards is enforced operationally rather
than through a formal process: Web pages or web servers need not be certified by any formal
authority, or registered before use. Instead, a document is considered conforming to the html standard if it can be displayed in a rather consistent manner across available browsers. Similarly, a web
servers is accessible if it does not return an error at the time of a request. The resultant tolerance of
the environment simplifies the design of clients and servers, and is therefore a major driver in the
proliferation of the Internet.
- Key factor 2: Consensus for consistency
Despite the diversity of providers and systems, what brings the Internet together and maintains
its wide connectivity, is a consistent naming of URLs as well as a consistent assignment of unique
identifiers for IP addressing and domain names. Because these assignments have to be unique,
1. http, in use since 1990, is based on a request/response paradigm. A client establishes a connection with a
server and sends it a request in the form of a request method, URI, and protocol version, followed by a
MIME-like message containing request modifiers, client information, and possible body content. The server
responds with a status line, including the message's protocol version and a success or error code, followed
by a MIME-like message containing server information, entity meta-information, and possible body content
(for more details refer to http://www.w3.org/Protocols/).
40
they are coordinated by the Internet Assigned Numbers Authority (IANA), and the Internet Network Information Center (InterNIC) which lead the organizations responsible for assigning IP
addresses and domain names respectively.
IANA and InterNIC belong to a larger group of organizations that maintain Internet consistency and standards. Table 2.6, which lists the most visible organizations, shows that the coordination and cooperation needed to maintain the Internet connectivity is happening without having a
central coordinating authority. Instead, consistency is maintained by a handful of organizations,
each influential in its own right. Collectively, these organizations form a checks-and-balances system for the related efforts [68]. Figure 2.3 summarizes key factors 1 and 2.
Organization
Focus
World Wide Web Consortium (W3C)
Sets data exchange standards for html, XML, etc.
Internet Engineering Task Force (IETF)
A volunteer organization, focuses on the evolution of
the Internet and on keeping the it running as smoothly
as a whole. It covers developing Internet protocols and
standardizing what has already been developed [69].
Internet Architecture Board (IAB)
Responsible for defining the overall architecture of the
Internet (backbones, networks, etc.). Also keeps track
of various numbers that must remain unique such as
the 32 bit IP addresses [69].
Internet Society (ISOC)
Supervisory organization that comments on Internet
policies and practices and oversees other boards and
task forces including the IAB and the IETF.
Internet Assigned Numbers Authority (IANA)
Leads the organizations responsible for assigning IP
addresses.
Internet Network Information Center (InterNIC)
Leads the organizations responsible for assigning
domain names.
Table 2.6 : Organizations involved in Internet coordination and standardization.
41
2. Document transfer standard html: simple to use
(browsers) and learn.
3. Consistency: consistent naming
of URLs, unique identifiers for IP
addressing, and domain names.
Client
Server
1. Communication protocol http: simple, stateless,
tolerant. Designed for network efficiency and
scalability in mind.
Figure 2.3 Simplicity of Internet standards and consensus for consistency.
Key factor 3: Extensibility and evolution
It is interesting to understand how a distributed environment like the Internet and its technologies gradually evolve to accommodate an exponentially growing pool of users and information
[40]. An example of such an evolution was triggered by the increasing difficulty of locating information in the growing web, to which the market responded by developing new tools that assist
users in categorizing and searching for information. Such tools include search engines, web crawlers, web directories and meta-search sites [37].
Similarly, as Internet usage grew, the demands of its users broadened. They needed more customization, more functionality, and more control over data flow. Html alone could no longer fulfill
the users' needs. New technologies were introduced to target these new requirements: web servers
were extended via back-end gateways [20] and more sophisticated scripting languages (such as
JavaScript and Active Server Pages) were developed. In this context, Java delivered a platformindependent expansion of functionality and XMIL provided a standard extensible way of exchanging information along with its semantics and structure. Gradually, other technologies were built on
top of those mentioned, such as the Java-based distributed computing platform, JINI, and the
XML-related technologies of XSLT and XML-RPC among others.
These technologies have followed the evolution of the Internet from its beginnings as a simple
network to what it is today, the foundation of today's e-commerce as well as many of today's
enterprise computing environments.
2.4.2 Implications
Even though it remains to be seen how challenges such as overcoming possible information
overloads and security risks will be addressed on the Internet [20], there are still some key lessons
that are worth exporting to the GIS paradigm. One conclusion from our brief Internet study is that
42
the success of the Internet has been as much about the technology standards as the culture of
interoperability embraced by its users, and their shared belief in the importance of interoperability
in such a medium. The embrace of interoperability will be equally critical for the success of geoprocessing.
As discussed in the previous section, the Internet does not provide a "one size fits all" solution
to every user. Instead, it is supported by a range of technologies targeting different needs. It is up to
the users to pick and combine the technologies they need to create solutions that best fit their needs
and constraints. This observation implies that, in the case of the geo-processing infrastructure, we
should not aim to find the one optimal configuration that will work for this domain. Instead, the
goal should be to identify both the constituent elements that can be combined, and the issues that
need to be addressed in order to provide the users with the flexibility of assembling their own solutions.
In addition to the above conclusions, this study highlighted the importance of having simple,
consistent and tolerant underlying technologies that can support a scalable distributed infrastruture. Operational compliance to standards as well as the distribution of responsibilities across
cooperating organizations were also identified as key factors in maintaining the infrastructure and
its growth without resorting to centralized controlling authorities.
The value of this study and its implications lie in providing us with an initial set of relevant
technologies, requirements and issues to consider. In our subsequent discussion of the distributed
geo-processing infrastructure, we will need to determine whether these requirements and issues
are applicable to it. We will also need to examine whether some Internet technologies (such as
those related to security, e-commerce, authorization, etc.) are generic enough to be directly used,
in GIS, or whether GIS characteristics will require specialized solutions.
2.5 Application Service Providers (ASP)
A discussion on Application Service Providers is included in this chapter as an example of a commercially available practice of delivering functionalities to users as Internet-based services. We
hope to use some lessons and strategies from the ASP model, as we expect it to have similarities
with the distributed geo-processing infrastructure, both in terms of key drivers and challenges
induced by a changing IT environment and the complexities of service integration.
43
2.5.1 Overview
ASPs are independent third-party providers of software-based services which typically specialize in a slice of business computing [102], delivered to customers as packaged ready-to-use
resources across a wide area network [6]. Although the ASP market only recently surged, the literature indicates that the core concept is not new, as it has been first proposed a third of a century
ago by Greenberger at MIT [102]. Its recent reemergence in the IT world is occurring as a result of
the convergence of today's technology and market conditions, as depicted in Figure 2.4. The
evolving technologies and practices have enabled the ASP concept to prove its potential for cutting
costs and providing flexibility to users, two of its many advantages (see Table 2.7).
Advantages for Users
Advantages for Service Providers
" Flexibility: customers pick what's right for them.
- Avoiding being locked-in by a certain vendor.
- Faster time to benefit, no management or maintenance of software (transparent to users).
- Access to otherwise out of reach technologies.
- Specialization according to expertise.
- Sharing the costs of expensive resources among
number of customers.
- Integration of services increases the value of each
service.
Table 2.7: Advantages of the ASP model.
Business Conditions
Technology Conditions
* Fast pace of business.
* Growing
global telecommunications
infrastructure based on IP technology
(increasing bandwidth, lower costs).
9 Experience in services such as security
(SSL, SKI), virtual private networks.
* Maturity of XML as a data exchange
standard.
9 Maturity of server-based computing (the
technologies
allowing
WWW,
applications to run centrally such as CGI,
Java, Windows DNA).
* Availability of easily configurable
software that easily lends itself to online
delivery as an Internet service.
" Increased competition.
* Prevalence of e-commerce.
* Increase of on-demand computing
provision.
* Need for focusing on core business
competencies, especially with the rise
of dot-coms who can't afford to do
everything internally.
* Cutting costs of ownership and daily
management when only simple subsets
of functionalities are needed.
pplication Service PrmvidW
Figure 2.4 Convergence of business and technology conditions.
44
Common examples of services offered by ASPs range from back-office automation to e-commerce and customer relationship management applications. An interesting example involves the
recently revealed .Net initiative of Microsoft, in which the company offers to provide its suite of
software as web services on the Internet, instead of directly selling it to its clients [30]. Given
increasing demands for a wealth of information and resources distributed across a network,
Microsoft aims to add value by hiding, from the user, the complexity of locating this information,
retrieving it and integrating with other information on various devices. XML plays a prominent
part in this approach, as it does with most ASPs, because it allows maximum customization of
solutions while remaining standard in terms of describing them. At the time of this writing, customers and competitors were waiting to see whether Microsoft would succeed in its "services
instead of software" vision [62].
2.5.2 Issues
Before we discuss some of ASP-related issues and their relevance to the distributed geo-processing model, we note the difference between the abstraction levels at which the two models provide their solutions to users. Even though both models advocate service-centered architectures, the
definition of service is different in each case. Under the ASP model, a "service" refers to indicate a
hosted prepackaged application (which includes a user interface and data storage) that is ready to
use by customers. In the case of the geo-processing infrastructure, a "service" is an autonomous
GIS component that can be called by other applications via a standardized access protocol. Given
this difference in defining services and its implications on the service provision model and its
dynamics, it follows that only a subset of the ASP issues will be applicable to the GIS model. In
this section, we discuss those issues we think are the most relevant for the GIS case.
Scalability
In any distributed system, scalability of the underlying infrastructure is critical to a sustainable
growth of that system. While an integrated design for a global ASP infrastructure does not yet
exist, ASP service providers are individually working on the scalability of their own service provisions, to help ensure acceptable performance when service loads are high. Towards that goal, service providers are experimenting with their internal setups of servers, fine-tuning variables such as
the number and horsepower of main and redundant servers, as well as the bandwidth available
between them and their clients. Other efforts are directed at maintaining clients' data integrity and
enforcing security in the distributed setup. The results of the efforts related to bandwidth, data
integrity and security are of particular interest to us when discussing the GIS infrastructure: the
size of exchanged GIS data can be potentially large, and the challenge of maintaining data integrity
45
is considerable, given the likelihood of data transformations applied to produce data that corresponds to specific clients' requests and needs.
Integration of Services
Experts in the industry are still waiting to see if the single application approach will be viable
in the long run. It is argued that while this approach is suitable for businesses needing just one
application, it is more likely that companies are going to need more than one application at a time.
If customers choose to locally integrate the independently-provided services, they risk running
into several known problems. On one hand, the various services could be using different models,
formats or databases, hence inhibiting interoperability among the applications. On the other hand,
companies will risk losing their customization of individual services as well as that of the integrated product if services are upgraded or modified by the providers.
In response to these problems, the ASP market is experiencing a surge of partnerships among
service providers. These providers are interested in adding value to their services by integrating
them with other complementary ones to create one-stop shops of coherent and customized applications. This approach introduces its own problems. The most noted ones revolve around concerns
for accountability of service providers in the case of problems or need for customer support.
Indeed it has been repeatedly reported that service providers point fingers at each other when an
"integrated" service under-performs or experiences problems [82].
In order to facilitate the service integration and data transfer among ASPs, the ASP industry
consortium, a non-profit organization comprised of ASP players worldwide, was formed in May
'00, in an attempt to foster the development of open ASP standards and common practices [6]. It is
still early to determine the degree of success of this consortium as well as the longevity outlook for
the ASP approach.
The ASP Market
Given the interoperability trend in the market, the key question becomes that of knowing how
service providers can differentiate their offerings without compromising interoperability. We can
distinguish two trends that are evolving in the ASP market. On one hand, some service providers
are trying to reach the widest segment of clients possible for their services by making them as
generic as possible. In such a market, branding offers an edge to providers with known names,
which expands their product visibility and hence increases their chances of gaining new clients.
ASPs offering generic services can also increase their value by integrating applications and providing single-stop shops and point of contact for their clients. The other trend in the ASP market
revolves around specialization, whereby some service providers choose to specialize in providing
46
services for specific domains and industries. These ASPs produce highly customized products for
their targeted niche of customers.
It is clear that the business models in this market are still evolving and so are the methods used
to charge clients for used services. ASPs are also still experimenting on that front, whereby some
are charging customers per individual service use or per data size transferred or processed, while
others are using a simple flat periodical fee structures for their services [3].
2.5.3 Implications
Although the ASP approach is relatively young with a yet unproven business model and many
of its complexities not yet fully resolved, it still provides us with some insights into the issues and
dynamics of the distributed geo-processing infrastructure. Some of these issues are summarized
below.
In terms of software design issues, we saw that a service-centered model requires a shift in the
engineering and deployment of software applications. This shift to "service engineering" clearly
puts legacy system providers at a disadvantage. This explains why software giants such as
Microsoft (and ESRI in the case of GIS) have initially been slow at aggressively developing software solutions for thin client computing. Such big players are also justifiably concerned about the
impact of the service model on the sustainability of their businesses and profits. However, it seems
rather unlikely that the service model will replace software as it is currently defined, even in the
long run. Although there are specific applications and environments where it is effective, it will
not always be the most appropriate route for delivering applications to users. We believe that this
observation also strongly applies to the distributed geo-processing infrastructure.
In terms of technologies, the discussion of ASP issues also highlights the growing role of
XML as a foundation for interoperability and data exchange, a role that is easily transferable to the
geo-processing infrastructure. Similarly, technologies developed and used for maintaining security, data integrity and user authentication in the ASP field can also be adopted in the GIS case.
Throughout our discussion, we will highlight any factors that are unique to a distributed GIS solution, leading to a variation in the treatment of these issues.
As for market dynamics, the ASP study has raised the issues of commoditization versus specialization, the importance of branding in this market, the value of integrated services to clients,
and the convenience of one-stop shopping for integrated services. The study hence hints at the
potential value of new combinations and types of "integration" services that may sit between giant
repositories of GIS data and the end user client applications in many niche markets. We will revisit
47
these issues in the conclusion, when we discuss the dynamics of the distributed geo-processing
marketplace.
To conclude, this chapter provided us with a glimpse of the issues that are likely to play out in
the discussion of architectures supporting a distributed geo-processing infrastructure. Some of
these issues are refined in the next chapter, where we describe a prototyping experiment that sheds
some light on other issues that are more practical in nature.
48
Chapter 3
Case Study: Raster Image Re-projection
Service Prototype
3.1 Introduction
An integral part of our research methodology consists of a preliminary identification of key
choices and issues through a carefully selected prototyping experiment. The rationale behind the
prototyping strategy is that the experience can help reinforce and/or identify issues that were not
necessarily unanticipated, but were nonetheless hard to conceptualize without actually trying
things in an operational setting (Refer to Section 1.4.2).
In this chapter, we report on our attempt to design a preliminary interface for raster image reprojection services, and our experience with building a service complying to the designed interface. It is important to clarify that our objective is not that of developing an optimal raster image
re-projection engine. Instead, the prototyping effort is for the sole purpose of gaining a deeper
understanding of the options and issues associated with the distributed geo-processing infrastructure problem. In particular, in our presentation, we consistently highlight the unexpected difficulties faced in both the design and implementation phases, arising from the lack of GIS
interoperability and the limitations of legacy GIS systems in accommodating the distribution and
scalability requirements. We then translate the highlighted difficulties into specific characteristics
for interfaces and services, needed to ensure a scalable and reliable distributed GIS infrastructure,
especially in the area of service integration and chaining.
We begin by providing the reader with an overview of the raster re-projection process, and the
characteristics that make it an appealing service to experiment with and learn from.
49
3.2 Re-projection: Overview and Motivation
3.2.1 Overview
This section is not intended to provide a detailed presentation of re-projection but rather to
familiarize the reader with its concept and complexities. For more information, the reader is
referred to the projection and re-projection references included in the bibliography.
Re-projection is the process of transforming an image from one projection system to another.
Projectionsi are methods of systematically representing the three dimensional information from
the curved surface of the Earth in two dimensions. A set of mathematical equations is used to convert from/to latitude and longitude coordinates to planar ones. Since any spheroid representing the
Earth cannot be transformed to a plane without distortion, other surfaces are used for projecting
the globe, such as the cylinder, the cone and the plane (as shown in Appendix C. 1). Depending on
the surface selected to approximate the spherical Earth, distortions to the area, shape and/or scale
of the projected image are introduced. Projections have therefore been classified according to the
characteristics of the map they preserve (summarized in Appendix C.2). Depending on the image
size, shape and location, users can pick a projection that minimizes the distortions for their purposes.
The procedure for re-projecting maps from one projection to another differs depending on
whether the map is in vector format or raster format. For vector data, the process is simple: for
each point on the map, the proper equations are applied to first obtain the equivalent latitude/longitude location values of that point. Next, these values are transformed into coordinates in the target
projection. For raster data, the process involves an additional level of complexity due to the structure of raster imagery as arrays of pixels with varying intensity values.
Re-projection in the raster case involves transforming one Cartesian grid to another. For each
cell (x,y) in the newly projected image, the corresponding coordinate (xo,yo) in the original raster
image must be calculated. As shown in Figure 3.1, problems arise when the cell (x,y) does not
coincide with a cell (xo,yo) in the original image. Interpolation or resampling around (xo,yo) is then
needed to estimate the pixel brightness value at (x,y) [85]. Appendix C.3 summarizes the three
most commonly used interpolation/resampling methods available: the nearest neighbor, the bilinear and the cubic interpolation methods.
1. Interestingly, projections have been the topic of discussions in thousands of cartography books and
papers dating back to 150 A.D. [89].
50
New Image
Original Image
Figure 3.1 Raster imagery re-projection.
This extra interpolation/resampling step in the case of raster imagery makes the raster re-projection process computationally intensive and slow. Depending on the uses of the imagery, appropriate approximation methods may be used to bypass this step and speed the re-projection.
However, this gain in speed comes at the expense of a loss in data quality, creating an interesting
trade-off which we explore later.
3.2.2 Motivation: Reasons for Picking Re-projection
Raster image re-projection is a relevant service to study and prototype for several reasons.
Undoubtedly, the growing volume of raster imagery becoming available in different projection
systems (Section 2.1.3) is heightening the need for a specialized service to transform these images
from their native coordinate systems to those defined by users. This need has also recently become
apparent in the context of the first phase of the OGC Web Mapping Testbed (See "Open GIS Consortium (OGC)" on page 35).
Essentially, the testbed main demo consisted of overlaying several maps fetched (according to
the WMT specifications listed in Appendix B) from different map servers in the coordinate system
chosen by the user. This step turned out to be more difficult than originally expected because, in
most cases, only a few servers at a time provided maps in the chosen coordinate system. In order to
make the demo more effective, servers needed to be able to re-project their maps into the chosen
coordinate system. For the purposes of the WMT testbed, participants solved the re-projection
problem in different ways: some converted and stored versions of their maps in the expected coordinate systems, while others developed local re-projection modules to re-project their data as well
51
as others' before returning the maps to the client. In both cases, the re-projection was transparent
to the end user, who had no control over the process or its parameters.
The WMT demo highlighted the increasingly pressing need for having a generic re-projection
service that can be transparently called by other services on the network, and selectively used as an
intermediate step in the delivery of customized maps. In this context, the true value of such a reprojection service stems from its indispensable role in service chains. Given this role, the design
and implementation of a prototype for such a re-projection service compel us to identify and incorporate, at an early stage, issues that are directly related to service chaining and integration in a distributed infrastructure.
A further advantage of implementing a re-projection service specifically for raster imagery is
that it focuses the discussion in many ways. The simple structure of geo-referenced raster imagery
coupled with its richness in information allows the discussion to center around the technical and
architectural aspects of service design and chaining without the added complexities of semantics.
Indeed, the use of raster imagery in examples throughout this thesis helps push the issues of bandwidth, compression and performance in a distributed environment.
Finally, what makes the re-projection case particularly interesting is that the seemingly simple
re-projection step can readily complicate the chaining setup with the choices to be made regarding
resampling methods, image geo-referencing representations and approximation methods used (as
illustrated later in this chapter). This in turn highlights issues pertaining to tracking image metadata and transformations as well as finding ways of hiding such complexities from the end user.
To summarize, given the need for generic re-projection services, their role in service chains
and their ability to focus the discussion on relevant issues, we believe that raster image re-projection prototyping will provide us with the insights we are looking for.
3.3 Re-projection Service: Design Process and Preliminary Interface
The first step of the prototyping experiment involves devising a preliminary interface for raster re-projection services. The interface is later used in the implementation and testing of a sample
service. Our goal in this first step is to start with a simple re-projection interface that is independent of possible underlying implementations, and that can accommodate, as much as possible, current and expected requirements. We emphasize the interoperability and simplicity criteria behind
52
the interface as they are pre-requisites to ensuring that compliant services are easy to use (alone
and in chains) and are effortlessly replaceable and substitutable by users.
A summary of the design process as well as the preliminary interface are presented next, with
a focus on the design decisions involved and the unexpected obstacles encountered.
3.3.1 Interface Design Process
In order for the re-projection service to chain easily with other services defined in the broader
setting of the OpenGIS web mapping group (Section 2.3.4), we followed a design process that is
consistent with the one used by the WMT group to define the GetMap and GetCapabilities
requests outlined in Appendix B. The only major difference is that our design process did not benefit from a WMT-like group/testbed environment. Such an environment can systematically inject
more feedback, more input and more control over variables into the design process. The re-projection interface presented here was refined based on our own implementation experiences only. Nevertheless, it succeeds at providing the right depth at which we can explore distributed services and
chaining issues, the main goal of this thesis.
In summary, our design process starts with the identification of the request categories the service is expected to offer and handle. For each request, the input parameters as well as the return
data type(s) are identified. For the purpose of simplicity, we use CGI to deliver the request to the
server, hence specifying the inputs as pairs of (key=value) statements. Below we discuss two
requests that the raster re-projection engine needs to support. The first request is ReProjlmage
(described in Section 3.3.2), which accepts requests for re-projecting a geo-referenced raster
image from one coordinate system to another. The second is GetCapabilities (discussed in Section
3.3.3), which returns the capabilities supported by the service (serves the same purpose as the one
described in Appendix B.2).
3.3.2 Image Re-projection Interface
The ReprojImage request is used to transform a raster geo-referenced input image from its
original reference system to a another reference system. While working on the presented interface,
we tried work with current state-of-the-art terminologies and formats, instead of introducing new
formats for projection, image or geo-referencing information. Although restricting, this approach
allowed us to identify some of the limitations of these state-of-the-art technologies, which are discussed in detail in Section 3.5.1. Below we describe the minimal set of inputs needed for a
53
ReProjImage request, commenting on the major choices and decisions made in our design process.
A summary of the request inputs is provided in Table 3.1.
- The image to be transformed: This is the most essential input element needed in a re-projection service. A simple way to make an image available to a stand-alone re-projection service on
a network is to send it from its storage location to the service with the rest of the transformation
parameters. This approach is however not practical given that, as described earlier, the image will
then have to be embedded within a CGI request. A more practical approach is to provide the reprojection service with the location from which it can retrieve the image. In an Internet setup, the
location of the image would be encoded in the form of a URL.
Using a URL has several advantages: it makes the chaining of a re-projection service more
flexible as the URL is not limited to pointing to an image that is stored on disk. It can also embed a
request to a map server to construct an image according to certain specifications. Hence in a chain,
instead of preparing a raster image and sending it to the ReProjImage service for re-projection, one
can simply send the transformation parameters along with a request to a map server to construct
the image. In this setting, using a URL in the ReProjImage request minimizes the transfer of
images between services and does not constrain the service to working with "finished" maps. The
URL parameter also makes the re-projection service more efficient as it can request image constructions as they are needed in the process. This point is particularly important since it ensures
"closure" amongst services. In other words, the re-projection service need not differentiate
between a URL that contains a request for an image to be constructed or one that directly points to
an existing image on a server. These two cases are treated identically and the difference is totally
transparent to the re-projection service. This "closure" property will be revisited again with examples in Chapter 4.
- The data format of the image: Given that the re-projection service cannot differentiate
between a finished image (with a known image extension such as http://tull.mit.edu/test.jpg) and
one that needs to be constructed by another server (such as http://tull.mit.edu/
REQUEST=map&BBOX=-97,24,78,36&WIDTH=560&HEIGHT=350&FORMAT=FPG),there
needs to be a way to explicitly communicate the format of that image to the re-projection service.
This data format parameter allows the server to know the type of the image's geo-referencing
information, whether it is embedded in the image itself such as in the case of GeoTIFF image, or
requires additional inputs to be specified. The alternative would be for the re-projection service to
identify the image format by parsing the input image URL. This alternative is however neither
scalable nor interoperable as it requires services to be able to parse and interpret each others' protocols.
- The geo-referencing information: If the geo-referencing information of an image is not
encoded within the image itself, it is usually saved in a separate text file created specifically for
that purpose. Unfortunately, the formats of these files are currently vendor-dependent. For example, MapInfo uses a .tab format to describe the geo-referencing information of maps while ArcView uses the world file format (.tfw,.tgw,.wld,.jgw etc.). The multitude of formats available for
representing this information poses a problem for the extensibility of our ReProjImage interface,
and emphasizes the need for a standard for describing this geo-referencing information and associating it with its source raster imagery. Given the unavailability of such a standard, a re-projection
service has to be able to parse and interpret several kinds of popular formats. Consequently, in
order to understand the geo-referencing information associated with an image, two pieces of information are needed: the format of this information, and the information itself.
54
When the geo-referencing information is specified in a separate file, there are several methods
for sending this file or its contents to the re-projection service. For example, one can send the URL
of the geo-referencing information file to the server along with the image. In this case, the server
makes two requests one for the image itself and one for its geo-referencing information. The
advantage of this approach is that it is consistent with the one selected for the image element of the
request and can also be constructed on demand with its associated image. However, this approach
still requires the server to understand and manipulate several formats of these geo-referencing
files. Alternatively, one can extract the needed information from the file and send it using newly
defined keys. This approach in fact calls for standardizing the way the geo-referencing information
is communicated among servers. The new tags or keys consist of pairs of pixel locations and their
corresponding geographic coordinates.
The original and target reference systems: Specifying the values for the reference systems
is a controversial issue. Although it is easier for the users to define them as abbreviated strings
(such as NAD83), currently there is no standard way of representing these strings. In order to
avoid inconsistencies in chaining with WMT-compliant map servers, we decided to follow the
WMT representations of reference systems, which were agreed on after an extensive debate
among the group's participants. Accordingly, we will henceforth use the format namespace:projection identifier. A commonly used namespace is the EPSG 1 one, which includes tables that
define numeric codes for many projections and associate projection and coordinate metadata for
each identifier. Namespaces can also be user-defined or vendor-specific.
- The resampling algorithm used: This parameter provides users and services with the flexibility of choosing a resampling algorithm that is appropriate for their specific needs.
- The pixel width and height of resultant image: These parameters allow users and services
to specify the pixel dimensions of re-projected images. An alternative way to convey the same
information is to use a zoom parameter, which defines the geographic extent covered by a single
pixel.
- The returned image format and the geo-referencing type: These are needed when users or
services request the returned image and/or its geo-referencing information in formats that are different from the originals. This presents a problem since, in response to each CGI request, the
server can only return one result/file to the caller. Therefore, per CGI call, the server cannot send
back the image and the geo-referencing information as separate files. One way to solve this problem is to encode these two pieces of information in one XML document. Indeed, the OpenGIS is
currently involved in defining a Geographic Markup Language (GML) 2, an XML-based standard
for representing geographic data. Another alternative is to use GeoTIFFs as a standard for geo-referenced raster imagery as they embed coordinate information in the header of the tif file.
- A preferred return error or exception format: Consistent with the WMT approach, a standard way of communicating errors to the requesting client and consequently to the user is provided. An error is expressed either in the form of an XML document, in which the error and its
severity are described, or in the same data type expected by the client (i.e., if the client requested
the image in jpg format, then the user receives a jpg image with an indication of the error). Return1. At http://www.petroconsultants.com/products/geodetic.html, EPSG has compiled a large number of coordinate systems and assigned a unique integer code to each one of them.
2. The Geographic Markup Language (GML) is an XML encoding for the transport and storage of geographic information, including both the geometry and properties of geographic features.
55
ing the exception in XML format is particularly useful when services are chained because it allows
a meaningful propagation of errors throughout a chain to the user. Image exceptions on the other
hand are appropriate when the client is expecting to receive an image regardless of the success of
the underlying operation. Depending on the design of the client and the chain of services, a suitable exception format can be selected.
- Other inputs: In addition to the basic parameters described thus far, other inputs might be
needed such as version numbers, user/projection specific parameters, or vendor-specific inputs
that can be used to optimize the performance of a server when it is used in conjunction with a particular client. Of course, it is desirable to minimize the use of vendor-specific inputs in order to
maintain interoperability of services. Within the WMT setting, these inputs provide vendors with
an opportunity to experiment with additional parameters. If these prove to generally improve the
design or the performance of the service, they may later be incorporated into the standard. Another
miscellaneous input might be used to store the color assigned to the pixels on the border of transformed images (See examples in Appendix C.5.3).
Table 3.1 summarizes the parameters to the ReProjImage interface discussed thus far.
3.3.3 GetCapabilities Request
In a distributed environment, it is essential to have services follow a consistent scheme for
describing their capabilities to interested clients. In the WMT context, this was accomplished
using the GetCapabilities interface, which, upon request, returns to clients the characteristics of a
server in an XML document. Querying servers for their capabilities through the GetCapabilities
interface allows clients to decide, before using the service, whether it is suitable for their purposes.
In Table 3.2, we list the characteristics most likely to be of interest to clients requesting a re-projection service.
56
Parameter & CGI name
Description
Values, Optional/Required
Image URL (imageURL)
Location of image to be transformed.
Required.
Image Format (format)
Format of image.
Required unless extension
of image allows the extraction of the format.
GeoRef Type (gType)
Type of associated geo-referencing information.
Required (values include
GeoTIFF, world, tab).
Georef Info (glnfo)
1. Send the URL of the geo-referencing
information to the server.
Required unless the image
is a GeoTIFF.
2. Extract the needed information and send it
using new keys/tags. These hold selected
pixel locations and associated geographic
information on the image. At least 3 pairs are
required for proper registration of the image.
Width & Height (width,
height)
Pixel width and height of transformed image.
Required.
Source SRS (fromSRS)
Namespace: projection identifier.
Required.
Target SRS (toSRS)
Namespace: projection identifier.
Required.
Resampling
(resampling)
Resampling method picked by client.
Optional, defaults to nearest
neighbor or approximation.
Output Image Format
(rFormat)
Format of output image.
Optional, defaults to that of
input image.
Output GeoRef
(rGtype)
Type
Type of geo-referencing information of output image.
Optional, defaults to that of
input image.
Format
Can be either an XML document or same
return type as successful request.
Optional, defaults to the
regular data type expected
by the client (debated in
wmt).
Version number, vendor-specific,
color, etc.
To be determined.
Exception
(exception)
Other Inputs
Method
border
Table 3.1 : Preliminary raster re-projection service query parameters.
57
=_=_
N-
-
-
-
p
-
~.
'-'-~
-~
U
-
Information
Discussion
Input and output projections supported by
This information can be supplied in one of two forms:
- The server returns two lists, one for supported source projections and one
for supported target projections. In this case, the assumption is that
ReProjImage handles the transformations from all source projections to all
target projections.
- The server returns a list of pairs of the source and target projections it supports. The only disadvantage of this method is that the list of pairs might get
too long when a particular service supports a large number of re-projections,
which implies a higher bandwidth needed to retrieve the capabilities.
service
Image
formats
ported by service
sup-
This can be returned as a list of the image types supported.
Geographic
information types supported by
service
Clients need this information to determine which format to send the data in,
and to determine what options are available for the return formats.
Exception formats supported by service
Used to indicate whether the server returns the data in XML or not.
Resampling techniques
supported by service
Since re-projection servers are not expected to be able to implement all available resampling techniques, this piece of information is particularly useful to
clients.
Relative speed or accuracy at which service
performs the re-projection
This information might be of interest to clients when choosing among several
re-projection services available to them. It is expected that dedicated re-projection servers will be hosted on powerful workstations, hence considerably
affecting the speed of service delivery. Similarly, some servers might provide
faster service by applying approximations to the data, instead of applying
more accurate yet more time consuming transformations.
Table 3.2: Re-projection service capabilities parameters.
An illustrative example of a DTD corresponding to the above listing of capabilities consists of
<!ELEMENT Capability
(ProjectionPair+,
ImageFormat+,
GeoInfo+,
Resampling+,
Speed?,
Exception?>
<!ELEMENT ProjectionPair EMPTY>
<!ATTLIST ProjectionPair
Source CDATA #REQUIRED
Target CDATA #REQUIRED
Reverse (yes,no) "yes">
<!ELEMENT ImageFormat
(gtif,
tif,gif,
jpg,png, other)>
<!ELEMENT GeoInfo (tab,world,geotiff,other)>
<!ELEMENT Resampling (nearest,bilinear,cubic,other) "nearest">
<!ELEMENT Speed (slow,normal,fast) "slow">
<!ELEMENT Exception
(default,XML) "default'>
Examples for using both the ReProjImage and the GetCapabilities interfaces are included in
the next section, which summarizes our experience with developing a working raster re-projection
service.
3.4 Re-projection Service: a Prototype Implementation
The most practical way to evaluate and refine a preliminary design is through a prototype imple-
58
mentation of that design. In this section, we describe our attempt to build a raster re-projection service prototype complying with the interfaces described thus far. The prototype is tested in two
settings: functioning as a stand-alone service and interacting with map servers. The section highlights the unanticipated difficulties faced in this phase and how they allowed us to gain valuable
insights into some practical constraints and requirements related to distributed GIS.
We remind the reader that the objective of this effort does not involve implementing a 'perfect'
re-projection service that is capable of handling a wide range of transformations, formats and
interpolation techniques. Nor is the objective to deliver re-projection through newly devised algorithms. For this reason, our implementation is almost exclusively based on using and/or adapting
existing re-projection tools and components. The goal for the initial implementation is for the prototype to support a meaningful subset of the input variations, which can later be easily expanded to
create the 'perfect' service. However, as we show later, our decision to use available tools and
components introduced its own set of problems and constraints, transforming our initially anticipated simple implementation step into a complex task. Nonetheless, the unanticipated obstacles
encountered are indicative of the limitations of current GIS technologies, and hence shed light on
areas that require improvement.
3.4.1 Implementation Options
There are many tools and components available in the GIS field for performing raster re-projecctoin. Understandably, most of these tools have interfaces that are optimized for their internal
operations and proprietary structures. Our decision to use ArcInfo 6.0 with RPC as the back-end
server despite its many problems (such as licensing restrictions and slow raster projection speed)
was a result of its availability for testing at the time of the prototyping, its support for a wide range
of image formats, the ease of running it as a server, and our familiarity with its operations. Other
choices available to us included:
- Using portions of proprietary GIS systems, like the ArcInfo ODE (Open Development
Environment) interface: The ODE environment is designed to allow developers to pick and customize subsets of Arcinfo functionality, which can then be treated as independent components.
Indeed, ODE allowed us to access and use the re-projection functionalities of ArcInfo from outside
the ArcInfo environment. The only problem we encountered during that process was the enormous
amount of memory needed (30 MB on average on a PC or workstation) for each re-projection call,
even for relatively small images. This implied that the prototype will lack in scalability whenever
memory available is insufficient for handling simultaneous re-projection requests. In addition,
being based on ArcInfo, whose raster re-projection process is often slow, the provision of the service was also slow. Table 3.3 provides some sample re-projection times for gray scale images.
Finally, the ODE option was also restricting in terms of licensing terms and fees, making the solution rather expensive in a distributed setting where the service is continually used by multiple cli-
59
ents.
Image Size
333 Mhz, 128 MB
memory UltraSparc IIi
Dual 550 Mhz Pentium Ills
512 MB memory 410 Dell workstation
2 KB
2 sec.
2 sec.
27 KB
5 sec.
3 sec.
72 KB
30 sec.
18 sec.
300 KB
32 sec.
19 sec.
631 KB
I min. 25 sec.
45 sec.
10 MB
> 1 hour
20 min.
Table 3.3: Sample re-projection times using ArcInfo.
Using portions of open source GIS code, like GRASS: Written in C and available for free,
using portions of the GRASS code will relieve us from any licensing constraints that are associated
with commercial proprietary systems. Unfortunately, the GRASS alternative was not without
problems. On one hand, the re-projection module r.proj is known to be slow and only re-projects
data in the GRASS raster format. On the other hand, we encountered some difficulties in the process of extracting the re-projection code from the surrounding GRASS code and environment,
given the tight interconnection between the two.
- Using available proprietary re-projection components, like the BlueMarble Geographic
GeoTransform components: BlueMarble Geographic (www.bluemarble.com) offers a simple
solution in the form of two geographic transformation DLLs that can be incorporated by developers in any application. Although attractive, this option suffers from licensing problems as we try to
distribute these components on the Internet.
- Writing our own re-projection code: There always remains the alternative of writing our
own re-projection code from scratch. The attractiveness of this option lies in the possibility of discovering faster and more efficient algorithms for re-projection in the process. However, as this is
not the focus of this study, this approach was not favored given that the time and the resources
requirements for such an endeavor.
The next section provides an overview of the final prototype developed.
3.4.2 Prototype
A flowchart of the prototype developed is shown in Figure 3.5. A more complete listing of its
capabilities and limitations is included in Appendix C.4. A sample call to this service is shown
below. It includes a request to re-project a jpg image (shown in Figure 3.2) from the Mass State
Plane to the Lat/Lon reference system. The output of the request is shown in
Reproj Image. cgi?imageURL=http: //tull .mit .edu&test.jpg&gType=world&gInfo=
http://tull.mit.edu/test.jgw&fromSRS=EPSG:26986&toSRS=EPSG:4269&resampling=
bilinear&width500&height=5001
1. In practice, the imageURL would have to be encoded in this request.
60
Figure 3.2 Original jpg image in the Mass State Plane reference system.
Figure 3.3 Image re-projected into lat/long using ArcInfo.
Our implementation worked well in terms of achieving our prototyping goals. However, it displayed several performance limitations, many of which were a result of our decision to use
61
ArcInfo version 6. For example, some performance problems were linked to our use of RPC
(Remote Procedure Call) methods to access ArcInfo's re-projection commands. With the RPC
connection, ArcInfo can only handle one call at a time, which implied that any re-projection
request received at the server had to wait in a queue until all requests ahead of it were processed.
The resultant delay in processing simultaneous or overlapping requests was a clear disadvantage of
the ArcInfo RPC solution. Requests were queued in the order they arrived, regardless of the size of
the image or its estimated processing time.
The other shortcoming of using ArcInfo was related to the software's inability to stream
images, hence requiring every image to be fully downloaded to disk before initiating the re-projection sequence. Furthermore, ArcInfo's re-projection modules were relatively slow when handling
large images (see Table 3.3), a serious disadvantage in a distributed setup where the combination
of reliable performance and fast response is valuable. This is especially true in cases of service
chaining where a delay in one service will affect the performance of the overall chain. On one
hand, the poor performance of the re-projection modules can be generally attributed to the underlying algorithms used in ArcInfo. On the other hand, the fact that these modules performed re-projection on ArcInfo grids only, dictated that we introduce time-consuming image-to-grid and gridto-image conversions in order to be able to manipulate and re-project the images 2
However, after observing the prototype's input/output patterns for several examples, we concluded that we could significantly improve its speed if we were willing to accept some loss in the
accuracy of the results. This can be achieved by substituting the exact yet slow re-projection step
with another one that delivers a tolerably approximate result in a much shorter time. The approximations were obtained by applying linear transformations to the images. These transformations
were derived from the pixel and geographic dimensions of the image in the source and target reference systems (see Appendix C.5). Figure 3.4 shows the result of our approximate algorithm
applied to the image in Figure 3.2. Comparing it with Figure 3.3, we argue that the difference
between the approximate image and the exact one is barely noticeable to the naked eye. In Appendix C.5, we show additional examples of exact-approximate pairs of re-projected images. All these
examples collectively demonstrate that for most typical applications involving browsing small
areas at a time, the trade-off between speed and accuracy will be more than acceptable.
1. Besides the ODE approach, RPC is the only way ArcInfo can be setup to run as a server.
2. After the release of ArcInfo 8 with improved projection code, we found that the grid-to-image and imageto-grid conversions remain the bottleneck in the re-projection process. ArcInfo 8 introduces a new command, projectGrid, which applies a best fit polynomial to re-project a grid, rather than projecting every individual pixel.
62
Figure 3.4 Image re-projected to lat/lon using approximation method.
In the next section, we summarize the results of chaining the prototype re-projection service
with the MITOrtho server (introduced in Section 1.2.2).
3.4.3 Chaining Prototype with MITOrtho Server
In order to familiarize ourselves with the intricacies of service chaining, we experimented with
chaining our re-projection service prototype with the MITOrtho server. As explained in Section
1.2.2, the MITOrtho server provides an interface for extracting customized geo-referenced images
of the greater Boston area. Currently, the MITOrtho server is limited to serving images in the Mass
State Plane reference system, the reference system which the images are stored in. Our goal was to
build a new service, MultiProjOrtho, that can deliver the images in other reference systems
selected by users. The MultiProjOrthoservice can be viewed as an intermediate service coordinating between the re-projection service and the ortho server, as shown in Figure 3.6. It allows users
to download chosen images by providing the desired SRS for the image, the geographical bounding box expressed in units of that SRS, the image's data format and its pixel dimensions. As shown
in the Figure 3.6, both the re-projection step and the call to the MITOrtho server are completely
transparent to the caller of the service. Our comments regarding this elementary chaining experiment are presented in the next section which synthesizes our observations and findings.
63
-
Image URL
Geo Info
from SRS
Reprojected Image +/- geo-referencing information
ReprojRaster
Service
to SRS
Resampling
OutputZoom
~
proj.aml
Read Input Parameters (*)
Check for Errors (1)
r
-
Convert EPSG codes into
ArcInfo terminology
4-
Download image and save
it to disk (2)
Construct a .prj file for the
current projection (5)
Connect to ArcInfo via
RPC (coded in C) (3)
Transform input image to
grid (6)
Run proj script in ArcInfo
(coded in aml)
Project grid
Prepare headers and send
reprojected image to caller
(4)
Transform the reprojected
grid (or stack) to an image
L--------------------
Clean up the intermediary
files and images
Figure 3.5 Prototype re-projection service: Internal flowchart.
All parts were coded in per] except when indicated.
1. This stage includes checking for missing parameters and detecting inconsistent information. This is done to ensure a consistent behavior in
case of errors. Consistent yet sometimes forgiving behavior becomes a necessity when considering that services will be continuously chained
and interacting.
2. Download the geoinfo if required. Because of ArcInfo's inability to deal with streamed images, the image has to be fully on disk before it can
be manipulated.
3. A new connection is created for each reprojection request. When multiple requests are received, ArcInfo queues the requests and processes
them in the order they were received.
4. The default is to send only the image (jpg, tif or gif) to the user of the service. When the user requests the geo-referencing information of the
reprojected image to be returned as well, the service can either return a geotiff or an xml document (containing both the image and the
geoinfo), depending on the user's choice.
5. A .prj file contains all the projection parameters needed for ArcInfo to project the current image.
6. In case of multiband images, ArcInfo transforms the image into a stack of grids, one for each band. Bands are reprojected separately and then
combined.
*
64
Bounding box
(gxmin,gymin,gxmax,gymax)
SRS
Multi Projection
Service
format
Pixel width & height
(pwidth, pheight)
Preparation step: determine URL of image on orthoserver that corresponds to image requested by user.
In order to do that, we have to determine the equivalent values for the bounding box and the pixel width
and height of the image on the ortho server.
maxX,maxY
gxmax,gymax
project 4 corners using Proj utility
gxmin,gymin
minX,minY
If the units of measurement are the same for the requested SRS and the Ortho server SRS, then the zoom of
the image (geographic distance per pixel) remains constant across the SRSs.
zoomX = (gxmax - gxmin ) / pwidth
=> newWidth = (maxX - minX) / zoomX
zoomY = (gymax - gymin ) / pheight
=> newHeight = (maxY - minY) / zoomY
C
imageURL
(as constructed
above)
04
C
Format = geotiff
Bounding box
(minX,minY,maxX,maxY
T
ReprojRaster
Service
format = geotiff
y
L
~min~minYmax
Pixel width &height
(newWidth,newHeight)
Mass Ortho
Server
_I
\
Reprojected image
_____j
Extraction step: since we obtained an area larger than the one requested, we need to extract the area of
interest from the image returned by the re-projection server. After reading the geographic width and
height of the returned image (using its geotiff header) and knowing the actual geographic and pixel width
and height of the requested image, we can easily find and cut the area of interest (using tiffcut) and return
it to the user.
return image to client
FCr3 i
Figure 3.6 Chaining re-projection service with MITOrtho server: Interaction flowchart.
65
3.5 Synthesis of Observations and Findings
This section provides a synthesis of the lessons learned from our prototyping effort, vis-a-vis the
challenges of realizing a flexible and scalable distributed geo-processing infrastructure, that is also
easy to manage and use. Most difficulties encountered in this effort were the result of the limitations of legacy GIS systems, and the lack of standards for geo-referenced images and SRS representations. We classify our observations and findings into four categories:
1.
2.
3.
4.
Standards and interoperability issues
Inherent GIS design issues
Distributed infrastructure and service chaining issues
Distinctive characteristics of the GIS problem.
When appropriate, the discussion of the above issues is linked to related issues previously
identified in the first two chapters, especially those pertaining to the range of users and applications, the enabling available technologies and the related ongoing interoperability efforts. The synthesis of all these issues will lead, in Chapter 5, to the construction of an analysis framework
within which we outline infrastructure requirements, identify key elements and players, and compare candidate architectures.
3.5.1 Standards and Interoperability Issues
Our prototyping experience revealed the lack of standards for at least two operations: the representation of projection systems and the encoding of geo-referenced raster imagery.
As a result of the lack of a standard representation of SRSs, the prototyping effort required the
conversion from and to three different forms of representations, namely the EPSG codes,
ArcInfo's internal string representations and the Proj utility' (from the cartographic projections
library). In order to ensure minimum functionality for the prototype, it became our responsibility,
as developers, to perform the back-and-forth SRS translations across these representations. We
found it necessary to limit the set of supported projections in order to avoid transforming our main
task of building an interoperable service to that of building a super translator of projection representations. The list of supported projections is shown in Table C. 1.
It is interesting to note that, in the OpenGIS testbeds, the EPSG codes have been heavily used,
almost to the point of becoming the defacto standard. Yet, GIS vendors such as ESRI, who have
been active participants in these testbeds, have not yet implemented mechanisms for their software
1. http://www.remotesensing.org/proj/
66
to interpret these codes. This observation reinforces the discussion of a standard's acceptance
(Section 2.2.2) as a critical pre-requisite to its success. Without standards, interoperability
becomes fragile and could easily break with new technological advancements. A distributed infrastructure build under these conditions would be hard to scale. This issue emphasizes the importance of consensus-based organizations such as OpenGIS, because of their role in establishing
standards.
Our prototyping experiment also highlighted the lack of a standard for representing geo-referenced raster imagery. At the time of the implementation, it seemed that the GeoTIFF format is the
defacto standard in that area, as most GIS software can read and produce images in variations of
that format. However, going forward, we believe that an XML-based format similar to the currently-under-revision GML (Geographic Markup Language) would better fit the extensibility
requirements of the distributed environment. An XML-based format is also better aligned with the
current approach of the industry as represented by OpenGIS. The examples' below illustrate how
an XML-based format allows for a "clean" separation between the image and its geo-referencing
information. The latter can be either represented inline in the XML document or as an Xlink pointing to a file containing that information. By allowing the geo-referencing information to be independent of the associated image format, the XML structure of the information offers more
flexibility to users. The XML documents can be easily filtered out by users to extract only the
information they are interested in for their specific purposes.
<GeoReferencedRasterImage>
<Content type= "image/gif"
xlink:href= "http://my.server.net/image.gif"/
<GeoInfo>
<SRSName> EPSG:26986 </SRSName>
<width> 500 </width> <height> 500 </height>
<xcenter> 231000 </xcenter>
<ycenter> 970000 </ycenter>
<gwidth> 230000 </gwidth>
<gheight> 90000 </gheight>
</GeoInf o>
</GeoReferencedRasterImage>
<GeoReferencedRasterImage>
<Content type =
"image/gif"
xlink:href = "http://my.server.net/image.gif"
<GeoInfo type = "world"
xlink:href = "http://my.server.net/image.gfw"
</GeoReferencedRasterImage>
Inline geo-referencing information
Pointers to external geo-referencing information
Figure 3.7 Encoding geo-referenced imagery using XML.
We should point out that, at the time of this writing, a group within the WMT2 (see Section
2.3.4) is working on a draft for coverage extensions for GML that cover the case of geo-referenced
raster imagery. Our impression of the draft is that it tries to be so comprehensive that it risks
1. These XML-based examples are very preliminary. Practical usage would require additional elements and
attributes within this basic structure.
67
becoming too complex even for simple cases like ours. It remains to be seen if, when and for what
purposes the specification will be eventually adopted. Our preview of this under-construction standard reinforces our understanding of the challenges of the standards' development process (Section 2.2.2), especially those related to timing and scope.
3.5.2 Inherent GIS Design Issues
In addition to the lack of standards, building a prototype was also complicated by several limitations of current GIS systems in terms of accommodating a distributed multi-user setup.
Although we encountered the obstacles while using ArcInfo, our experience with other systems
shows that the limitations can be generalized to most legacy GIS software. An example of such
limitations is the inability of ArcInfo (and most of today's standard GIS packages) to provide the
functionality of connecting to data (and services) over the Internet through a TCP/IP connection.
As an integral part to the operation of the prototype, this functionality had to be separately coded
in Perl (see Figure 3.5). Additionally, the lack of modularity in the GIS systemi made it difficult to
isolate and independently access ArcInfo's re-projection modules (even using ODE, as discussed
in Section 3.4). This limitation later jeopardized the prototype's scalability and robustness when
dealing with concurrent re-projection requests.
Given the traditional uses of GIS as stand-alone systems, these limitations are understandable.
The new functionalities required by the distributed setup are stretching the capabilities of the traditional systems beyond what they were originally designed to do. Indeed, the single user presumption explains ArcInfo's non-reentrant code and personal workspace characteristics, as well as the
underlying assumption that all data are available and accessible locally from within these workspaces. Consequently, in order for us to use such a system in a distributed, multi-user setup, users'
requests had to be queued in the order they arrived in as opposed to being processed in parallel.
This setting implies a longer average waiting time for the users. Likewise, images had to be fully
downloaded to disk before starting the re-projection operations as opposed to being streamed in
and incrementally processed.
Given these limitations, there is a real necessity for a fundamental change in the design of GIS
software for it to be practical in a distributed environment. Indeed, we are beginning to see this
trend emerging. For instance, ESRI is releasing a redesigned object-oriented version of ArcInfo,
1. Over the last couple of years, ArcInfo has been completely redesigned and redeveloped to leverage
object-oriented and component-based technologies. However, the fact that the project took more than a year
longer than expected by its team suggests that the original system (the one used for our work) was very interleaved and non-object oriented.
68
and new players, such as BlueMarble Geographic, are filling the need for simple specialized GIS
components. Furthermore, the practicality, ease-of-use and sustainability of the distributed infrastructure are imposing minimum performance and service delivery requirements. These requirements can be best addressed by services designed to support multi-user and multi-threaded
requests, streaming data, and cashing of queries' results, among others. The next section takes a
practical look at these issues.
3.5.3 Distributed Infrastructure and Chaining Issues
For a successful wide-based deployment of a distributed geo-processing infrastructure, the
infrastructure needs to meet both technological and ease-of-adoption (by users) criteria. Thus far,
our focus has been on identifying the key technological criteria of scalability, extensibility and
interoperability. As for the issue of ease-of-adoption, its importance became clear to us through
our exposure (albeit limited) to service chaining in Section 3.4.3. We conclude that there are three
key prerequisites to a wide-base user adoption of a distributed geo-processing infrastructure:
1. Performance and reliability of services
2. Ease of access to services and their capabilities
3. Simplicity of service chaining.
The three prerequisites are discussed below.
1. Performance and reliability of services
To users of the distributed infrastructure, the performance and reliability of individual and
chained services are critical. In this context, performance denotes the capacity of services to produce desired results with minimum expenditure of time and resources. A common measure of performance for distributed systems consists of the response time of a service and its throughput [87].
As for service reliability, it refers to the availability and trustworthiness of services as well as the
up-to-dateness of any data they serve.
In terms of performance, users will expect the distributed solution to perform at least as well as
their current systems. Even though the distributed setup provides users with the flexibility and the
ability to assemble a solution from a breadth of services, there will be little incentive to effectively
use it if it comes at the expense of an inferior overall performance. The issue of reliable performance deserves special attention in the case of mobile users, whose mobility can quickly make
location-based requests obsolete or inconsistent if they are not processed and delivered fast
enough.
Moreover, in Section 3.4.3, we observed that the overall performance of a chain of independently-provided services is only as good as its "weakest link". The performance of individual ser-
69
vices can be improved by optimizing the underlying algorithms or by hosting services (and/or
replicating them) on specialized dedicated machines. For example, the performance of a computationally-intensive, memory-demanding service, such as the re-projection service, can be considerably enhanced if hosted on a multiprocessor machine, with a lot of RAM (for handling a larger
number of larger images) and fast input/output pipes (for downloading and returning large
images). The re-projection service case study serves as a good example of demonstrating how performance can be intelligently enhanced by applying an algorithm that delivers an approximation of
the re-projection in a fraction of the time required to perform the exact operation. This however
might come at the expense of reduced data accuracy and quality, which may or may not be an
acceptable compromise, depending on the data's application context. As each context encompasses different requirements for functionality and service level, there will be other trade-offs,
such as the trade-off between the breadth of functionality provided by a service and its degree of
specialization/depth (see Section 2.2.2), which is also addressed in the next section.
In terms of reliability, data authenticity deserves special attention. It is easy to see why users'
need for data authenticity favors centralized or federated architectures where services are registered, certified and rated by designated authorities. The need for data authenticity also advocates
strongly-branded established organizations and trustworthy service providers. Chapter 4 will
explore these issues in greater detail.
2. Ease of access to services and their capabilities
Another practical consideration of using the distributed infrastructure is the ease with which
users are capable of accessing and locating the data and services that they need.
The ease of access to services is proportional to the degree of interoperability and simplicity of
the services' interfaces. Indeed the interoperability of the services guarantees, by construction, that
they are easily interchangeable by users in applications. Section 2.2 covered some of the challenges embedded in achieving this objective. As for the simplicity of the services' interfaces, it is a
particularly desirable feature for the expanding non-professional GIS user base. However, as
described in Section 2.2.2, simplicity involves its own trade-offs. If an interface is simple enough
to be used by a wide range of users for a variety of applications, it might lack the amount of depth
needed for more advanced applications. However, if it is drafted to accommodate this depth of
functionality, it risks becoming more difficult to be used for the simpler applications (recall our
discussion of the coverage extension to GML in Section 3.5.1).
The other factor influencing the ease of access to services is the ease with which users can
locate these services and learn about their capabilities. This is where an interface such as the get-
70
Capabilities interface (described in Appendix B.2) contributes to the big picture, as a standard
method for querying services about their underlying functionality and data. Within the context of
our earlier discussion about performance issues, this interface could also be used by services to
advertise their degree of accuracy, their speed, their current load, and other performance-related
metrics.
As with any distributed environment, it will become increasingly difficult to locate and keep
track of services as the number of available services grows. In Chapter 4, we address how this
problem can be alleviated by introducing facilitator services (such as catalogs, directories, search
engines, etc.) which can be queried and browsed by clients to locate what they need.
3. Simplicity of service chaining
As discussed in Section 1.2.3, service chaining is the main source of complexity to the users of
a distributed geo-processing infrastructure. In order to make the infrastructure usable to as wide an
audience as possible, it should be possible to hide the complexity of chaining. In this chapter, we
saw how the design of the ReProjImage interface facilitated its chaining with map services, such
as the MITOrtho server, by providing the image parameter as a URL. This URL can be used to
request a customized map from the map server. Nevertheless, as the number of services grows in a
chain, it becomes more difficult to keep track of various service interfaces, capabilities, locations
and intermediate results among other factors. In this case, intermediary mediating services can be
introduced in order to coordinate between the user's preferences and the capabilities of available
services, and to transparently construct the chain of services that matches the client's need. In this
context, the mediating services handle the dialogs amongst involved services, the exchange and
transformation of data, as well as keep track of metadata and accounting. In doing so, the mediating services achieve the objective of hiding some of the details of service chaining from the user.
The next chapter covers these mediating services, their roles and their variations in detail.
3.5.4 Distinctive Characteristics of the GIS Problem
Our exposure to service design and chaining in this chapter, together with the background
research covered in Chapter 2, allowed us to ascertain some distinctive characteristics of the GIS
problem, which differentiate it from other distributed infrastructures. Those characteristics are
listed in Table 3.4, and are discussed in detail in this section.
71
Characteristics Extracted from
GIS Literature
Characteristics Extracted from
Prototyping Experience
Underlying complex data structures
Legacy starting point
Distributed, multi-disciplinary nature of data
Coordinate transformations
Diversity of spatial information
Metadata
Interdependence of geo-entities
Interactive use of data
Complexity of operations
Interest in archived data
Semantics
Size of data (length of transactions)
Table 3.4: Distinctive characteristics of the GIS problem.
The GIS literature attributes the uniqueness of the GIS problem to the distributed and multidisciplinary nature of geographic information [46], as well as the complexity of the multi-dimensional data structures used to represent that information [45]. The complexity of the operations
performed on spatial data is further exacerbated by the interdependence of geo-entities and the
propensity of proximate locations to influence each other and possess similar attributes. For a wide
range of analysis, a single layer is often not sufficient, and its usefulness is greatly enhanced when
linked or merged with other sets. Even when these sets are interoperable, semantic heterogeneity,
as the literature stresses, is a major limitation to the re-use and sharing of this data. The differences
in semantics and terminologies make it difficult to recognize various relationships that may exist
between similar or related objects from different sources [52].
Our prototyping experience confirmed the general thinking in the literature, from the perspective of looking at the deployment and use of GIS data in the specific context of a distributed geoprocessing infrastructure. Furthermore, in many cases, this perspective highlighted certain practical issues that are likely to require special attention in a distributed setting. For instance, in Section
3.5.1, we experienced first-hand the additional effort required to overcome the limitations of GIS
legacy systems, and to accommodate heterogeneous data models and representations.
The prototyping experience also uncovered an additional level of complexity incurred as a
result of projection systems transformations. On one hand, we learned that, even if the interoperability of data is possible, these data are only useful if they are in the coordinate system of the client
requesting them. On the other hand, we happily noted that, with fairly simple transformations
applied to the data (such as changing its format, scale, quality, coordinate system, etc.), the scope
of applications of that data can be significantly broadened. For these reasons, the use of GIS data
72
lends itself to a more service-based model, whereby data and transformations are independently
selected to create customized solutions.
Another distinguishing characteristic of GIS data derives from the way this data is used and
manipulated by users. For most applications, viewing, zooming and panning across a map is not
enough. Users are typically also interested in the metadata behind the data, the constituent layers
as well as individual features. The process of working with maps is hence quite interactive. Keeping track of the interactivity, the states and the intermediate results are both necessary and challenging. Sometimes, the interaction is even more involved, such as in the case of an address
matching service returning more than one match to an input address (which is not uncommon).
Furthermore, consider the example of a user using a handheld device to access local maps. The
user is unlikely to be satisfied with the first map that he receives. He will also likely request additional layers for the same area, zoom in and out and pick certain features to obtain more information about. In this setting, the user is repeatedly retrieving tiny subsets of archived data sets, and
may not be concerned about the absolute accuracy of the data. So clever compromises between
performance and accuracy are warranted. Also, subsequent requests are likely to be based on data
already downloaded. Such a localized use of the data calls for the possibility of categorizing
according to its geographic location and the ability to find information based on its location (such
as proposed in the .geo proposal summarized in Section 2.3.6). Additionally, GIS data can be
returned to users at one of three levels: as picture, as a collection of features or as a coverage. The
way of interpreting and combining such data and the resultant complexity changes depending on
the level used in an application. Finally, compared with other mobile services, such as getting
stock quotes, GIS-based mobile services are likely to require more bandwidth. Accordingly, clever
data compression techniques are called for.
The conclusion is that with GIS data, there is "no size that fits all". That is why flexibility is
highly needed to serve different types of users, for different purposes, with varying degrees of
interactivity. To corroborate this point, the WMT now has two interfaces: The first is for getMap,
which simply returns a map (picture) while the second is for getCoverage, which returns the raw
data so that users can manipulate the data without going to the server for each transaction. The key
is to give users enough options for them to mix and match in order to fit their needs.
In summary, this chapter described our experience with designing and developing a prototype
image re-projection service. This experience reinforced and complemented some of the basic
issues, requirements and challenges of building and using a distributed geo-processing infrastructure. In the next chapter, we use the issues identified thus far to create a framework within which
73
candidate architectural configurations supporting the distributed infrastructure may be analyzed
and contrasted.
74
Chapter 4
Architectures: Components, Chaining and
Issues
4.1 Overview
In this chapter, we use the set of issues identified in Chapters 2 and 3 to analyze alternative architectures for facilitating seamless integration of distributed interoperable services. For each architecture, we determine the constituent components, study the resultant chaining process and
highlight the trade-offs involved. In addition, we identify the key players expected to influence the
evolution of these architectures, and the niche markets likely to develop with respect to each of
them.
4.2 Approach
An architecture is defined as the partitioning of a system into major components or independent
modules [104]. Hence, the architectures presented in this chapter are best identified and differentiated by the types of components they consist of, and the distribution of chaining management tasks
among these components. The architectures are presented in an incremental fashion, each building
upon and adding components to those discussed before it. From the client's perspective, each
architecture we present offers a higher level of abstraction, by increasing the transparency of service chaining.
From our discussion in previous chapters, we learned that the needs of different sets of users
are likely to be met by different architectures, depending on the nature of their environment and
75
the specifics of their tasks. We also saw how the application setting influences the set of choices
and trade-offs that users face. Accordingly, the architectures discussed in this chapter are not
mutually exclusive. Instead, they can and will coexist within different applications and environments. For this reason, we feel it is important to identify the most typical uses for each of the architectures.
To establish a basis for comparing the different architectures, we construct a simple yet nontrivial example, and examine it under the light of each architecture.
4.2.1 Example Scenario and Assumptions
In this chapter, we use the example of a user providing an address to an application and
requesting a geo-referenced image centered at that address1 . Since our focus is on the components
of the architectures, few assumptions are made about the client. In this case, the client can be an
Internet browser, an application running on a mobile device, or a part of a larger application.
One aspect of this example's simplicity is that the GIS data types handled are limited to raster
imagery. Therefore, the example avoids the additional complexities of heterogeneous semantics
and topology representations. Nevertheless, despite its simplicity, the example is rich enough for
studying architectural issues, and exploring the trade-offs in various service chaining approaches.
The services used in the example belong to three categories:
- An address matching service (e.g. the Etak service introduced in Section 1.2.2): According to
the GeocoderService RFC draft submitted to OGC, an address matching service transforms a
phrase or term that uniquely identifies a feature such as a place or an address into applicable geometry (usually either a coordinate x,y or a minimum bounding rectangle). For simplicity, we assume
that the service used by our client provides (x,y) coordinates in any projection and coordinate system specified by the user 2 . Typical address matching services return more than just the (x,y) coordinates. Additional information such as a normalized address, matching precision and location
census tract are often appended to the coordinates. However, for the sake of simplicity, we assume
that the additional information can be filtered out such that the client receives only the coordinates
in response to a request. In some cases, address matching services return several locations matching a given address. In such cases, user intervention may be necessary to determine the intended
address.
- A map service (e.g. the MITOrtho server): The map server returns a map corresponding to
pre-specified geographic and pixel dimensions of an area. In our example, we use the map and
capabilities interfaces described in Appendix B.
- A re-projection service (e.g. the one developed in Chapter 3): This service is needed because
the native projection of a data set may not be appropriate, depending on the application, the scale
at which the data is requested, and the projection system of other data sets used by the application.
1. For the sake of simplicity, we assume that the size of the image in pixels is fixed.
2. The Etak service returns the coordinates only in latitude/longitude.
76
imageURL
geolnfo
.
fromSRS
toSRS
Re-Projection
Service
resampling
Image +/- geo-referencing
information
outputZoom
layer
bounding box
width
hiht
height
khImagery
PW0
foma
format
P
SRS
N
Map
Service
Image
address
Address Matching
SRS
(x,y) coordinates
Service
Figure 4.1 Illustration of services used in the example.
Figure 4.1 provides an input/output illustration of services from the three categories above. We
emphasize that the client in our example is not limited to a particular service from each category.
For instance, the client can access different map servers covering different geographical areas.
Similarly, depending on the application, the client might have the option of using either an approximate re-projection service as opposed to an exact one, or accessing a service that specializes in
certain transformations.
In terms of authentication, we assume in our example that the authentication of the client by
the services can be handled using available authentication technologies, such as Kerberos, cookies
or basic http authentication. Similarly, we assume billing for services and data can be managed
using current e-commerce related approaches, such as Ecash (www.ecash.com), CyberCash
(www.cybercash.com) or PayPal (www.paypal.com).
4.2.2 Focus of Analysis
As mentioned in Section 1.2, there is a wide range of issues surrounding the design of a scalable and extensible geo-processing infrastructure. Accordingly, for the example to offer any analytical depth, we need to focus on a subset of these issues. This subset covers the identification of
77
core components of underlying architectures, as well as their role in service chaining vis-a-vis performance, metadata tracking and error reporting. Additionally, we use the service chaining discussion to address issues of complexity of dialogs and data structures, and also their implications on
the client's thickness and intelligence capabilities.
The focused analysis will serve as a basis for a broader discussion of the issues in the next
chapter. For instance, the analysis will help us understand how the distributed geo-processing
infrastructure can fit within an ASP world, and how GIS can be integrated with mainstream IT
technologies.
4.3 Abstraction Level 1: Decentralized Architectures
At the lowest level of abstraction, the architecture's only components are the geo-processing services. At this level, service chaining and management are handled exclusively by the client. The
next section provides a basic overview of services, followed by an analysis of service chaining in
the decentralized environment.
4.3.1 Geo-Processing Services as Basic Components
A service provides access to a set of operations accessible through one or more standardized
interfaces. In the process, the service may use other external services and operations.
Services can be grouped into two categories: data services and processing services. Data services, such as the MITOrtho server, offer customized data to users. These services are tightly coupled with specific data sets, and therefore, their capabilities describe both the interfaces they
support as well as the data they offer .
Processing services, on the other hand, are not associated with specific datasets. Instead, they
provide operations for processing or transforming data in a manner determined by user-specified
parameters. Processing services can provide generic processing functions, such as projection/coordinate conversion, rasterization/vectorization, map overlay, feature detection and imagery classification. Processing services also encompass the set of image manipulation services, which include
resizing images, changing colors, computing histograms, applying various filters [79] and performing geospatial statistical analysis [50]. These services can be specialized according to particular fields such as forestry, landuse, agriculture or transportation.
1. The metadata about the data can be specified according to ISO 19115 [58].
78
Clients can use the getCapabilities interface to retrieve a machine parsable description of the
operations (and data) supported by a service (see Appendix B.2). The capabilities listed in the
appendix may be further extended to include real-time information about the service, such as its
current load, and the estimated processing time for the next request. Such information may be critical to some clients (such as mobile ones), which exhibit time and performance restrictions. A service's capabilities can also be used to store information about the groups of authenticated users
allowed to access that service. Furthermore it can be used to convey the cost to a user for using the
service. The cost of use may vary, depending on the type of user (individual, institution, government entity), on the amount of data/processing requested, and on the usage frequency. In e-commerce settings, clients may probe services for their price offers for specific transactions before
committing to the use of these services. Additionally, services may provide free demo or limited
versions that allow users to sample their basic capabilities.
4.3.2 User-Coordinated Service Chaining
Given the absence of coordinating entities in the decentralized setting, a client (such as the one
in the example described in Section 4.2.1) is fully responsible for managing all the interactions
with the services. In this decentralized setting, the client must have prior knowledge of all service
locations, by maintaining a hardcoded list of the services it uses. The rules used to decide which
services are used at any point in time are also hardcoded in the client. These rules are applied when
the client needs a new set of services to use. However, in most cases, consecutive transactions
often use the same services. In order to avoid repeated retrieval and parsing of the capabilities of
frequently-used services, the client may save a local copy of this information. The frequency at
which the information is refreshed depends on how much it is likely to change from one request to
the next, or from one session to another.
Figure 4.2 illustrates the extensive amount of work performed by the client in our example to
obtain the requested image. For every request, the client constructs and sends queries to the services involved. It coordinates the sequence of requests as well as the transfer of information
between services. The client must also handle the intermediate results. For the seemingly simple
task of retrieving an image centered at an address, the management responsibilities of the client are
considerable. These responsibilities will only multiply in the case of more elaborate chains involving more services.
79
Client
Re-Projection
Service
Ortholmagery
Service
Address Matching
Service
1. Client sends address to Address Matching service.
2. Address Matching service returns a list of possible matching geographic locations. Client picks one
address from the list and then goes through a prepared list of imagery services to locate a service covering the selected geographic location. Client finds suitable Ortholmagery service, sends a getCapabilities
request to double check that the service is up and indeed covers area of interest. In the process, client
finds that the selected service cannot return the image in the client's projection system.
3. Client requests image from OrthoImagery service in a projection system supported by the service.
4. Ortholmagery service returns image in its native projection system. From a prepared list of Re-Projection services, the client picks a service that can transform the image into the desired system.
5. Client uses selected Re-Projection service to transform saved image.
6. Re-Projection service returns final image to client.
Request
Inputs
Output
AddressMatch
address= "77 Mass Ave Cambridge MA 02139"
SRS = EPSG:26986
x,y
Map
bbox=x-a,y-b,x+a,y+b
image.jpg
SRS=EPSG:26986
width,height, format, etc.
ReProjImage
imageULR=image.jpg
fromSRS=EPSG:26986 (Mass State)
toSRS=EPSG:4269 (lat/lon)
reprojected image.jpg
Table 4.1 : Simplified service requests with input/output parameters.
Figure 4.2 User-coordinated service chaining in decentralized architectures.
Local storage of intermediate results at the client is one shortcoming of service chaining as
depicted in Figure 4.2. However, this storage is necessary because of the process the client follows
for constructing the chain. This process involves calling the services successively, storing the
returned output of each service, and subsequently forwarding it to the next service in the chain.
One way to overcome the shortcoming of local storage is to directly embed, in the input to one service, a request to the following service in the chain. This approach, shown in Figure 4.3, leverages
the closure characteristic of services mentioned in Section 3.3.2. The same approach is employed
80
in SQL, whereby a query can be embedded as a parameter in another query, hence avoiding the
need to retrieve and temporarily save its results in a table before using them.
Even though the services in Figure 4.3 appear to be calling each other, it is not necessary for
them to know or understand each other's interfaces. The construction of the requests and subrequests is still managed entirely by the client.
Clent
Re-Projection
Service
Ortholmagery
Service
Address Matching
Service
Client constructs a request leveraging service closure to chain the services. The calls to the participating
services are nested in one long call. An example of a pseudocode for the request is:
Reproj .cgi?fromSRS=EPSG:26986&toSRS=EPSG:4269&
imageURL=URLencode (ortho.mit.edu/wmtserver.cgi?request=map&
bbox = function (www. add~atch. com/addressMatch. cgi ?address="ff77 Mass Ave
Cambri dge MA"&SRS=EPSG :269 86 ))
1. Instead of downloading the image itself, the client provides the re-projection service with the UJRL of
the image that it needs. The geographic coverage of the image in turn depends on the results of the
address matching service.
2. As a first step, the re-projection service retrieves the image pointed to by the imageURL.
3. The bounding box of the image depends on the address matched by the Address Matching service.
4. The Address Matching service returns the coordinates to the imagery service.
5. The Orthoimiagery service now has all the parameters needed to construct the image. The image is
returned to the calling service, namely the Re-Projection service.
6. The Re-Projection service now has all the inputs needed to perform the re-projection. It re-projects the
image and returns it to the client.
*. For the sake of simplicity, we assume that the case of multiple address matches is handled through a
separate dialog between the client and the Address Matching service (not shown here).
Figure 4.3 Using nested calls for service chaining.
4.3.3 Complexities of Nested Calls
While nesting calls promises to simplify some of the coordination responsibilities of the client,
it nevertheless introduces complexities in the areas of error and metadata propagation as well as
the client's ability to control certain details. In Figure 4.2, the client communicates directly with
the individual services. Consequently, when an exception is generated by a service, the client has
81
first-hand knowledge of that exception. As mentioned in Section 3.3.2, exceptions can be sent
either as XML documents, or as files in a format that is expected by the user. In either case, the
direct link to services enables the client to easily detect the exception and respond accordingly.
With nested calls, informing the client of the exact nature of an exception is more complicated.
Consider for example the case of the address matching service issuing an exception in
response to an erroneous call from the client. To the orthoimagery service in Figure 4.3, this
exception is viewed as an invalid input, triggering the orthoimagery service to signal an "invalid
input" exception. A domino effect ensues, as the same problem occurs over the orthoimagery/reprojection link, forcing the re-projection service to signal its own "invalid input" exception (this
time to the client). Although the client is eventually informed of the occurrence of an exception in
the chain, the actual exception received by the client does not disclose information about the
source or the cause of that exception. One way to overcome this limitation is to allow a service to
automatically forward a received exception input, as is, to the next service in the chain; while
appending to the forwarded exception any of the service's own error messages. In this context, representing exceptions in XML is particularly useful as it makes it easier for services to detect and
add to incoming exceptions. An example is shown below. The client can easily parse the XML-formatted exception and access the innermost error message to determine the root cause of the problem.
<WMTException version = "1.0.0">
<Service = "http://coast.mit.edu/reproj.cgi">
<wmtserver-001: Invalid input>
<WMTException version = "1.0.0">
<Service = "http://ortho.mit.edu/wmtserver.cgi">
<wmtserver-001: Invalid input>
<WMTException version = "1.0.0">
<Service = "http://www.addMatch.com/addmatch.cgi">
<addMatch-35001:Unsupported SRS>
</WMTException>
</WMTException>
</WMTException>
This approach of relaying exceptions to the client can be extended to handle metadata
propagation. One example of metadata is information about billing from individual services.
Metadata can be appended to normal data as it is passed between services. However, for services
to process and exchange documents containing both the data and its metadata, a standard is
required, as was discussed in Section 3.5.1.
Finally, we consider the issue of an unexpected delay occurrence in one of the services in a
chain. The serial nature of the chain implies that the delay propagates through the chain, and all the
82
way to the client. In the scenario where the client directly accesses each service, as shown in Figure 4.2, the client can abort the operation if a specified time-out period for a service expires. In that
case, the client can opt for a substitute for the timed-out service. However, with nested calls, the
client loses the direct connection to individual services, and must wait for the final overall result.
In order to control the length of this waiting period, a new global time-out may be introduced. This
time-out is controlled by the client and is communicated to every service in the chain. At any
point, if a service takes longer than this global time-out, it will abort and return an appropriate
exception to the preceding service in the chain. Despite its simplicity, this method of handling
time-outs requires the addition of at least one parameter to the services' interfaces, namely the
time-out duration. This duration ought to depend on the statistics of the processing time for a a service, and the type of the client. Mobile clients, for instance, are expected to need shorter time-outs
given their mobility and the critical dependence of their requests on location.
The efficiency of chains can be further improved through data compression. Compressing data
decreases transmission time, although it may be at the expense of increasing the processing time at
the service. Nevertheless there ought to be a trade-off point, which depends on the nature of the
services, their operations and the bandwidth available between them. Given the potentially large
size of raster data (depending on resolution and extent), data compression may be particularly beneficial in our example.If data compression is used for raster imagery, then there will be a need for
an open standard (which does not yet exist). Candidates for such a standard include the Multi Resolution Seamless Image Database (MRSID) and the Enhanced Compressed Wavelet (ECW) algorithm [64].
4.3.4 Issues and Implications
The detailed analysis of the example illustrates how service chaining in a decentralized setting
requires deep involvement of the client. In other words, a significant amount of complexity is
imposed on the client in order to (1) locate the services and data needed for the task and (2) coordinate the dialogs among the services selected. This complexity directly contradicts the "ease of use
requirement" for wide-base adoption of a distributed geo-processing infrastructure (as discussed in
Section 3.5.3). In the remainder of this chapter, we focus on identifying alternative ways for minimizing this complexity.
In terms of locating data in a decentralized setting, an information geo-indexing scheme (such
as the .geo proposal presented in Section 2.3.6) can prove to be valuable. Such a scheme simplifies
the process of dynamically searching for data according to its location. However, given the global
83
extent of such a scheme, issues of data quality assurance may quickly arise, and can easily push for
controversial data certification procedures. Furthermore, such a scheme does not address the problem of locating geo-processing services, as the physical location of these services is likely to be
independent of the functionality they provide. Section 4.4 introduces catalogs as a more general
and scalable way to address this problem.
The other source of added complexity to the client derives from its responsibility for coordinating dialogs among different servers. In this context, interoperability and simplicity of service
interfaces are critical as they spare the client the burden of handling multiple interfaces for similar
services. Similarly, the use of XMIL for standardizing data and metadata exchanges between services contributes to minimizing this aspect of the client complexity. In the next section, aggregate
services are introduced as a method for hiding the complexity of service chaining from the client.
4.3.5 Aggregate Services
Aggregate services bundle pre-defined chains of services and present them to the client as one.
By handling all control and interaction among the services, they hide the complexity of service
chaining from the client. Aggregate services can be thought of as extending the capabilities of one
service by combining it with another one. For instance, an ortho imagery service can be extended
to handle additional reference systems by combining it with a generic re-projection service. Figure
4.4 illustrates how the chain in our example can be hidden from the client in a black box.
address
---
SRS
ylayer
Address
Matching
SRSSrvicewidth
Service
ye r
(xy)-height
format
Imagery Map
Service
Image imageURL
700"
Sericegeolnfo
.
.--. .
--fromSRSI
___-_
toSRS
Re-Projection
Image
Service
outputZoo
Figure 4.4 Aggregate services.
By bundling complementary services into one aggregate service, the aforementioned complexities of error reporting, metadata propagation as well as authentication and accounting are completely hidden from the client. For all intents and purposes, the client will not be able to distinguish
84
between an aggregate service and a basic one. In many cases, we expect the constituent services of
an aggregate service to be supplied by the same provider. In such cases, better efficiencies can be
achieved by using proprietary protocols for the communication among the constituent services.
Despite their benefits, aggregate services have some drawbacks. By having a single access
point to the chain, the client loses some of the flexibility and control over parameters of the individual services. For instance, in the example depicted in Figure 4.4, the client has no control over
the re-projection step. In fact, the client is not even aware that the image is being re-projected. The
invisibility of this step and the assumptions made by the aggregate service can be misleading to
some clients. Consequently, clients should be able to differentiate between basic and aggregate
services. Indeed, such information can be communicated to the user via the capabilities file, which
may include a flag for aggregate services, and a link for more information about its constituent services.
Unfortunately, aggregating services and presenting them as one service to the client has a negative impact on the size of the capabilities of the aggregate service. In our example, the aggregate
service capabilities file needs to include the long list of all possible SRSs that can now be supported by the service. As the number of the constituent services grows, the combination of options
available also grows, albeit at a much faster rate. This in turn increases the time it takes to transmit
the capabilities information from the service to the client. In many cases, however, the increase in
the transmission time is negligible, given that this information is text-based. A more serious drawback however is the additional processing needed by the client to parse the longer capabilities file.
Fortunately, there is a more flexible and scalable alternative to aggregate services in a distributed environment where static binding of services and calls is often not efficient. This alternative,
namely mediating (or smart) services is covered in Section 4.5.
4.4 Abstraction Level 2: Federated Architectures with
Catalogs
From the discussion above, it becomes clear that a certain level of control is needed to prevent the
decentralized architecture from becoming chaotic. Federated architectures provide this minimal
control by finding a balance between the flexibility of decentralized setups and the coordination
advantages of centralized ones [26]. In a federated architecture, the individual services are still
loosely coupled, and maintain their autonomy. There is still no central controlling authority in the
system. Instead, catalogs are introduced to enable clients and services to find, query and browse
services on the network [100].
85
4.4.1 Catalogs for Service Discovery
Catalogs have long been recognized as providing an efficient way for organizing and condensing knowledge of large collections of items [39]. They provide a set of common services to support local and global information discovery, metadata retrieval, information browsing, cataloguing
and indexing. The primary function of catalogs is to help locate the address of a dataset or a service based on its metadata record of it. This in turn maintains the transparency of the location of
the data and services. Catalogs can index services by several categories, such as location, service
type, provider, cost, domain, certification or even performance statistics. In most cases, catalogs
will keep local copies of the basic capabilities of the services they point to. These capabilities are
updated either directly by the service provider, or by occasionally pulling services for their capabilites.
In a distributed infrastructure, several catalogs may be available, and may even point to each
other. Given the existence of many catalogs, they too need interoperable interfaces. Indeed, the
OGC is working on the Catalog Services Abstract Specifications, which defines interoperable spatial data catalogs that can be used to discover spatial data holdings in different environments [79].
The specifications also include interfaces for defining, adding, removing and modifying entries in
catalogs. The OGC work is also linked to the ISO TC/21Iwork in the area of metadata content.
4.4.2 Service Discovery and Chaining
Figure 4.5 illustrates the interactions between the client, catalog and services in a federated
setup. The introduction of catalogs alleviates the client from the burden of service discovery. However, the client is still responsible for making the decisions about which services to use after consulting with the catalog, and is also specifying the details of service chaining. In some cases, the
catalog may return several services as matches to the client's query, in which case the client may
need to query the services directly for their detailed capabilities, and make a decision accordingly.
86
Catalog
Client
Re-Projection
Ortholmagery
Address Matching
Service
Service
Service
1. Client queries the catalog for an Address Matching service and an Ortholmagery service covering area
of interest. If needed, the client also uses the catalog to find a suitable Re-Projection service. For each
query, the catalog may return one or more addresses for services that can be used [39].
2. Depending on their design, catalogs may occasionally update their listings by querying services for
their capabilities. Services that are registered in a catalog can trigger this updating process whenever
their capabilities are modified.
3. While the client does not need to store lists for the services it needs, it still has to manage and coordinate of service chaining.
Figure 4.5 The role of catalogs in a federated setup.
4.4.3 Issues and Implications
Figure 4.5 illustrates how catalogs mat be used by clients to look up services they need for
their chains. Although introducing catalogs to the architecture simplifies the design of a client, it is
still required to construct and manage the service chains. This implies added client responsibilities
for communicating with catalogs and interpreting their results. Furthermore, if queries and their
results are encoded in XML, then clients need to have the local intelligence for constructing and
parsing sophisticated XML documents. For most applications, however, the communication with
catalogs does not have to be frequent. Service addresses retrieved through earlier queries can be
locally saved at the client, and subsequently used without referring back to a catalog.
In terms of implications for GIS market dynamics, we believe catalogs will be made available
by both the public and the private sectors. In the public domain, many governmental entities
already maintain a variety of detailed datasets covering their jurisdiction, and are increasingly
interested in facilitating public access to this data. A federated setup for keeping track of the data
available nationwide goes a long way towards making this data more accessible to a wider range of
users and applications. In this federated setup, each state may designate an agency to coordinate
87
the distributed efforts of geographic data collection, storage and dissemination within that statel.
In this context, the designated state agency can host catalogs which index the agency's own data,
and point to other catalogs/data provided by individual counties or towns within that state.
In the private sector, catalogs are likely to be supplied by satellite imagery providers. These
providers can use interoperable catalogs to maintain indices for the images they collect, as well as
pointers to generic geo-processing services commonly needed to manipulate these images. More
geo-processing services are likely to be provided by current big GIS software providers such as
ESRI and Intergraph. Such players are likely to be interested in catalogs as a way to advertise their
own services and provide a one-stop shop for geo-processing services. Given their known brand
names and their dominance in the GIS market, it will not be surprising if these big players charge a
higher premium for their services. It remains to be seen, how soon such dynamics will materialize
as they require the big players to shift their business models and modify their system architectures,
which they have been slow to do.
Catalogs also offer interesting opportunities for new players in the GIS and IT markets to provide more sophisticated search tools. As the number of services and catalogs available in an environment grows, there will be an increasing need for search-engine-like tools that can consolidate
information retrieved from various catalogs. Such tools may also provide interfaces through which
users can pick services they need. Furthermore, these tools can dispatch the users' requests to a
variety of available catalogs, and then allow users to sort the results according to different criteria,
e.g., price, quality or provider. As such, these tools are similar to popular online price comparison
sites (e.g., metaprices.com or mysimon.com) which allow users to pick a category of items to compare (e.g., cds, books, electronics) and then return a list of items along with their prices, availability, special offers and reviews from various online shopping websites (e.g. amazon, and barnes and
nobles).
However, even with the aid of such sophisticated tools, the complexity of service chaining
from a client's perspective is still high. The next section complements our discussion of federated
architectures by introducing mediating services as efficient agents that can further raise the client's
level of abstraction when constructing a service chain.
1. For example, MassGIS is the logical choice for such an organization for the state of Massachusetts, as it
is already responsible for the collection, storage and dissemination of geographic data in Massachusetts.
88
4.5 Abstraction Level 3: Federated Architectures with
Mediating Services
In order to relieve clients from explicitly manipulating multiple service connections and handling intermediate results, we introduce mediating services. Mediating (or smart) services act as
gateways to other services by coordinating between multiple services without necessarily storing
any data of their own. Mediating services combine the simplicity of aggregate services with the
flexibility and control inherent in decentralized architectures.
4.5.1 Mediating (Smart) Services
The concept of mediating or smart services is borrowed from the database arena. In a distributed database setting, mediating elements are often introduced to dynamically convert multi-database queries into smaller sub-queries that can then be dispatched to the various databases.The
results of the sub-queries are then integrated by the mediating element and returned to the client. In
the database literature, these mediating elements are also referred to as facilitators, brokers, and
dispatchers [36].
Correspondingly, in a distributed geo-processing infrastructure, mediating or smart services
dynamically construct and manage chains of services. Based on their client's requirements, mediating services determine appropriate data sources and services, retrieve and process the data, and
then assemble the final response. In the process, a mediating service may consult with catalogs,
search engines or meta-search tools known to it. It can also keep its own indexed lists of useful services, which are more likely to be biased towards certain providers and/or domains. For efficiency
purposes, mediating services may also provide commonly used basic functions, such as format,
coordinate or vector-to-raster conversions.
Moreover, mediating services may use pre-specified client preferences to search for appropriate data and processing services that best meet their clients' requirements. Such preferences might
include information about service time-outs, price ceilings, accuracy requirements, and maximum
number of services chained. In some cases, the client may also wish to specify a preference for a
particular service provider. The client may also impose a constraint that all services used in a particular session be supplied by the same provider, presumably to achieve certain efficiencies as well
as monetary savings.
89
4.5.2 Mediating Services and Service Chaining
Figure 4.6 illustrates how the complexities of service chaining can be alleviated by a mediating service. With much of the complexity hidden from the client, the desired thinness of the client
is restored. As discussed earlier, client thinness is a critical element in promoting distributed geoprocessing infrastructures.
Smart server
Catalog
Re-Projection
Ortholmagery
Address Matching
Service
Service
Service
1. Client is hidden from the complexity of locating services and coordinating their chaining. Client either
specifies its preferences to the mediating (or smart) service by having an accessible local preference file,
or registers its preferences with the smart service a priori.
2. Smart server maintains current state. A sophisticated smart server can try to anticipate the next request
of the client or cache some results that are not likely to change. Client can specify the frequency at which
latest results are refreshed at the smart server.
3. & 4. Dialogs between smart servers and catalogs are the same as in earlier described architectures
(Figure 4.2 and Figure 4.5)
Figure 4.6 Service chaining with mediating (smart) services.
4.5.3 Issues and Implications
Examining the scenario in Figure 4.6, we note that the mediating service need not necessarily
construct a new chain of services for every client request. In most cases, once the mediating service identifies a suitable chain for a particular task, subsequent requests for that task will use that
chain, unless otherwise specified in the client' preferences. Depending on the application, the client may specify that its chain be re-constructed only once per session or per request. This flexibility, however, comes with a caveat: if the chain is re-constructed for each session, then consecutive
sessions are not guaranteed to be using the same sets of data or services. Whether this is acceptable
to the client depends on the nature of the client and its application.
90
The client may also want to specify that a particular chain used in an earlier session be used for
the current session. One way to implement this feature is to save the desired chain as a cookie on
the client, and communicate to the mediating service with each request. Alternatively, the client
can use the existing chain, and manage the service chaining itself. In this case, the mediating service is used as a tool (or an agent) by the client to determine the optimal chain of services needed
for a particular client. It is then up to the client to execute that chain. However, this alternative may
be more expensive to the client, since subscribing to one mediating service is likely to be less
expensive than subscribing to a variety of individual data and transformation services. Furthermore, given that a mediating service will serve multiple clients, it will be in a better position to
negotiate better deals with the individual services.
Finally, it is important to point out that, at the heart of any mediating service lies a set of predefined rules which dictate what the optimal selection and chaining of services are for each task.
These rules are used to match clients' preferences with appropriate services and data sources
(accessible by the mediating service). As such, mediating services can be considered as specialized
versions of existing process management and integration tools. The need for specialization is a
consequence of the distinctive characteristics of GIS (see Section 3.5.4), especially the semantics
associated with GIS data and the complexity of spatial queries. With the wide range of possible
GIS applications and the different semantics needed in different fields, it is more likely that the
internal mediating service rules will be tuned to specific application domains.
Therefore, in terms of market dynamics, we foresee the emergence of a variety of mediating
services, which range in their "smarts" as well as the nature of their specialization. The need for
domain-tuned services will constitute excellent market entry opportunities for third party players
with significant expertise in a certain domain, but with no capabilities to single-handedly offer and
maintain all the data and transformations needed for that domain.
In summary, mediating services promise to minimize the complexity of service chaining while
providing clients with solutions that are specifically tuned to their preferences. A mediating service also provide a client with a single point of contact for accounting and authentication, as well
as error and metadata reporting. Naturally, and as we have seen in other areas, standards will be
needed in these areas to make the infrastructure scalable and the design of mediating services simpler.
91
4.6 Summary
The architectures, their components and the key issues presented in this chapter are summarized in
Table 4.2. In the final chapter, we consolidate the issues we uncovered thus far in this thesis. We
formulate a framework/roadmap to help developers/users make the "best" choices for building/
using a distributed geo-processing infrastructure. We then use this framework to study the implications of our analysis on the set of standards and protocols that are needed to build and support a
distributed geo-processing infrastructure.
92
Architecture
Decentralized
Potential Influential
.
Orgazations
Components
catalogs
Federated with
mediating
services
Key Issues
- Services
- OpenGIS
for determining
- Client, including error han-
- Complexity of service chain-
- Aggregate services
service interfaces and categorization
dling, accounting and authentication
ing management for clients
- Compression standards
-
Federated with
Coordination
Responsibilities of Service
Chaining
geo for registries
- Services
-
- Aggregate services
standards for catalogs
- Catalogs
- Governmental
- Data exchange standards
- Simplicity of interfaces
- Difficulty of locating services
- Client still responsible for the
- Catalog interfaces
chaining
- Catalog data return and XML
for
- Catalogs are used to locate
parsing
catalog and metadata coordination (linked to NSDI)
- Private companies for the
provision of specialized services
and query services
- Catalogs can be sophisticated
and act like search engines or
metasearch engines
- Service registration in catalogs
- Standards for metadata
- Opportunities for new and
old players, public and private
- Services
- Same as above in addition to
- Tuned smart servers
- Error propagation
- Aggregate service
- Catalogs
- Mediating (smart) servers
private companies in various
industries providing smart
servers for niche markets
ISO & OpenGIS for metadata
entities
- Metadata propagation
- Specification of preferences
- Degree of "intelligence"
Table 4.2: Summary of architectures: components, service chaining and issues.
Chapter 5
Synthesis and Conclusions
5.1 Summary of Research
This thesis is motivated by the growing need for a new distributed GIS model, in which GIS functionalities are delivered as independently-provided yet interoperable services (Point 2 in Figure
5.1). Such a model is especially beneficial for scientific research and engineering modeling as well
as state and federal government settings, where tightly coupled hierarchical systems are unlikely to
provide the desired breadth and flexibility. A distributed geo-processing model allows users in
these settings to freely combine services to create customized solutions with minimal programming, integration and maintenance efforts. Furthermore, a distributed geo-processing infrastructure can facilitate the integration of GIS with other information systems, and can also support the
needs of thin mobile clients for location-based services. A distributed infrastructure will hence be
a key enabler for GIS to extend beyond its traditional boundaries of mapping to embrace a broader
community of users and a wider scope of services.
This thesis focuses on identifying the major issues associated with building and using such a
distributed infrastructure. Recognizing that there is no "one size fits all" solution in distributed
GIS, one goal of this thesis is to consolidate these issues into a framework (or roadmap) that can be
used by developers and users to navigate through the choices available to them. In following the
framework, developers and users will still need to make implementation decisions that will be
often based on trade-offs between conflicting requirements.
The major issues were assembled by approaching the research from three complementing perspectives. The background research presented in Chapter 2 supported the viability of a distributed
94
model, particularly in light of the enabling IT technologies and growing e-commerce and ASP
markets. This part of the research also reinforced the crucial role standards play in such a distributed environment. In Chapter 3, we presented the synthesis of our own experience with service
design, implementation and chaining. This experience highlighted some practical aspects of building and using distributed services, and contributed to the identification of key characteristics of
GIS that set a distributed geo-processing infrastructure apart from other distributed IT systems.
Finally, in Chapter 4, we analyzed alternative architectures for realizing a distributed infrastructure
and identified their basic components. We concluded that the next generation GIS model is likely
to follow a federated architecture, where the basic components consist of services (basic and
aggregate), catalogs and mediating services. The discussion in Chapter 4 also shed some light on
expected dynamics in the new GIS marketplace, highlighting potential opportunities for new players to target a variety of niche markets by packaging some of the architectural components in ways
that best serve the target markets.
Degree of
Interoperability
2
Degree of Componentization
Degree of
Distribution
1: Local non-interoperable GIS package
2: Fully interoperable distributed GIS components
Figure 5.1 The shift to distributed interoperable GIS components.
As these issues were identified, it quickly became evident that the complexity of navigating
through available choices explodes a lot sooner than expected, even for simple scenarios. This
complexity is intensified by the heavy dependence of the distributed geo-processing infrastructure
on a variety of technological areas. As expected, there can be no unique solution that fits everyone's needs and constraints in this setting, especially when the details of the underlying technologies themselves have not been fully resolved. Consequently, the trade-offs upon which the
distributed geo-processing choices are evaluated will keep evolving as different parts of the market
gravitate towards technical solutions that fit their needs.
95
Therefore, a framework that allows developers and users to navigate through the complexities
is valuable, because it provides them with an easy-to-follow roadmap that facilitates their search
for a solution that best fits their needs with minimal sacrifices. Such a solution should fit within an
interoperable world of GIS processing while being consistent with broader IT and web technologies.
5.2 Navigation Framework (Roadmap)
In this section, we present the framework developed after researching the issues, choices and
trade-offs involved in building and using a distributed geo-processing infrastructure. This framework is aimed at helping users and developers sort through the choices available to them. It is also
intended to assist potential service providers in identifying the nature and combination of components they are best positioned to serve and specialize in.
As we present the framework, the reader will notice that many of the individual issues facing
users and developers do not significantly differ from issues facing the IT community in general.
However, we argue that the way these issues are addressed in the GIS context may be different,
given the distinctiveness of the GIS problem (see Section 3.5.4), especially in terms of the potential high bandwidth requirements, large sizes of data elements and the various abstraction levels at
which the same data can be used.
5.2.1 Description of Framework
We group the key issues that need to be addressed by users and service providers along five
dimensions:
1. Application environment (see Table 5.1): The application environment of the user or the
developer affects the ease with which new applications and clients can be introduced or used. In
the GIS context, it also affects the ease with which geo-data can be integrated with other non-spatial data.
2. Data characteristics (see Table 5.2): The nature of the data as well as its typical uses
within an application greatly affect the design of the application, both in terms of the extent to
which data is distributed, and the formats used to represent, save and access that data.
3. Service characteristics (see Table 5.3): These characteristics refer to those of the services
that best suit the application environment and data requirements. Service characteristics are further
shaped by the nature and constraints of potential client(s) accessing the service.
4. Client characteristics (see Table 5.4): The range of potential clients as well as the constraints of individual clients determine what type of services and data are needed, and how to best
combine them in order to meet these constraints.
5. Standards (see Table 5.5): Standards are the glue that allow components of an infrastructure to effectively work together.
96
In Tables 5.1-5.5, we identify, within each dimension, a basic set of questions to guide the
users of the framework in articulating their needs and constraints. For each dimension, we also
explore the available choices, their advantages (denoted by +), their drawbacks (denoted by -), and
present our recommendations (denoted by *) based on the findings of this research.
97
Table 5.1 : Framework Dimension 1: Application environment.
Dimension
11
Key Questions
Current IT infrastructure
* Are the systems vertically or horizontally integrated?
- How rooted are legacy systems in the organization?
Applications
* To what degree does spatial data need to be integrated with
existing databases?
* How likely is it that applications will access different/distributed
repositories of data?
APPLICATION
ENVIRONMENT
00
Implementation choices
* How flexible is the organization in terms of technological
choices? Is the organization committed to specific vendors?
" Are our current requirements best addressed using vendor provided solutions?
- Do interoperable services meeting our requirements exist in the
market?
" Do we have the expertise to integrate these services in-house?
Can we outsource that integration?
- Do we have the expertise to develop a complete solution inhouse? Do we have the capabilities to maintain the services and
data?
- What are our timeframe and budget constraints?
I
Choices, Trade-offs and Recommendations
- Go with a vendor provided solution
+ Can provide a customized package for what is needed.
- Restrictive approach, may be difficult/impossible to add new
functional clients with minimal efforts.
* In many cases, it is the only way big organizations can provide a
coherent package of geo-processing services across departments.
- Develop solution in-house
+ Allows customization and optimization of systems.
- Requires extensive investment, IT talent, maintenance, expertise.
* Allows implementation of simple tasks simply and today.
Requires special attention to ensure that solution is easily extensible. It can be used while waiting for upcoming standards.
- Use existing service provider(s)
+ Provides you with flexibility to mix and match services, frees
you to focus on your area of expertise. The cost of the service is
distributed among users.
- May still require customization, and involve integration complexities. Potential problems with reliability, accountability, data integrity, and security.
* Use only if services and their results can be easily integrated
(requires standards). Specially recommended when services are
tuned to specific application domains. Use if the back end is expensive.
Table 5.2: Framework Dimension 2: Data characteristics.
Dimension
DATA
CHARACTERISTICS
1i
Key Questions
Choices, Trade-offs and Recommendations
Data needed for current and potential applications
* What is the type of data needed? (archived, collections of features, maps)
- How large is the size of the data typically needed for a task?
- How frequently is the data updated?
- How specialized is the data? how localized?
- Keep a local copy of the data
+ Provides faster access especially for larger data, more control.
- Requires maintenance and updates.
* Keep a local copy if the data doesn't change frequently, when all
the topology details are needed, or when typical requests are huge.
Typical uses of the data
- How frequent is the use of the data?
- Do we typically use the latest versions or do we need older ones
as well?
- Are there any accuracy or performance constraints?
- Is it likely to be shared by other groups/organizations for different purposes?
* How much control do we need to have over the data?
- Access on demand
+ Access on demand, only extract data needed.
- Less control over the data (formats, styles, display), requires the
right set of partners and standards.
* Recommended when data changes frequently, and/or when a
variety of clients applications need to access these data repositories.
- Use proprietary format
+ Can be optimized for internal purposes.
- Harder to integrate with other data types.
* Use when the client is tied to the back-end, and/or data does not
need to be integrated with other data.
- Use GML
+ Interoperable, easier to integrate data, extensible.
- Might be too complex for simple operations, cannot accommodate all data models, and can be lossy.
* Use when geo-data needs to be integrated with other sets, and/or
needs to be used for a variety of applications.
Table 5.3: Framework Dimension 3: Service characteristics.
Dimension
Key Questions
Resource requirements
- How expensive is the service to setup and maintain?
* How computationally intensive is the service?
- How difficult is it to expand the number of users accessing the service?
Use in applications
- Is the service more likely to be stand-alone or does it
need to be integrated with other services?
- How many applications/clients need the service now and
in the future?
0
SERVICE
CHARACTERISTICS
Type of service
* Is it a commodity service, or is it a special service used in
certain domains?
* How difficult is it to be customized?
- Should we use/implement a map, feature or coverage service?
- Is it typically used with other services in certain combinations? Is it more efficient to offer an aggregate service
instead?
- Do we have expertise in a specific domain? Do we have
access to some local data or services? Is it more efficient
to offer a mediating service instead? Can we find partners
for complementary services/data?
" Do we have a lot of data? Is it worthwhile to construct a
catalog for these services?
[
Choices, Trade-offs and Recommendations
- Map server
+ Easy to use, graphically formatted results can be easily used in thin clients such as web browsers.
- Provides users with no control over symbology or display characteristics, transparency issues when multiple layers are fetched.
* Use when using simple clients like web browsers, and/or when only
pictures of maps are needed.
- Feature server
+ XML-encoded results, smaller size, more control over display properties.
- Requires clients to understand and handle feature characteristics.
* Use when features are needed, when client can render the results
locally, or when control is needed over how the information is filtered
and displayed at the client.
* Coverage server
+ Provides users with access to the raw data, maximum control.
- Requires a thicker client, still lacks standards.
* Use when the raw data needs to be manipulated locally by the client.
- Basic (self-contained) service
+ Allows users to access and use repositories of information for a variety
of applications, maximum flexibility.
- May involve complex integration efforts.
* Use if number of services is small. Especially recommended for data
services.
- Aggregate service
+ Easier to use for simple combinations.
- Complicates scalability of capabilities, and has issues of metadata and
transformations.
* Use if it can expand the number of applications that can use your basic
services (e.g., coordinate transformation).
- Mediating service
+ Provides tuned services as well as access to specialized services.
- Requires the right of set of partners to deliver such services.
* Use if you have/need specialized expertise in a niche market.
Table 5.4 : Framework Dimension 4: Client characteristics.
Dimension
Key Questions
Range of clients
- How many clients are likely to use the services/data?
- How similar are these client characteristics?
" Who is providing the clients?
CLIENT
CHARACTERISTICS
Individual clients
- How thick can the client be?
- How smart can the client be?
- What are the client's constraints in terms of performance, bandwidth, display, etc.
Choices, Trade-offs and Recommendations
Thin
Thin clients require the back-end services to handle the fetching
and processing of data. Thin clients can use mediating services to
handle the complexities of locating, combining and rendering any
data needed. If the bandwidth of the client is also limited (as is the
case with wireless mobile devices), the amount of data sent to the
client might need to be minimized.
Thick
Thick clients can locally handle all or some of the processing.
Thick clients can afford to use coverage services to retrieve information and manipulate it locally. Such clients can also handle service chaining.
0
Table 5.5: Framework Dimension 5: Standards.
Dimension
Key Questions
Timing
- Are current standards sufficient?
- Do we need new standards? Should we wait for them?
STANDARDS
Standardization level
- Should the data format be standardized?
- Should the data exchange format be standardized?
- Should the metadata be standardized?
- Should data interfaces be standardized?
Comprehensiveness and extensibility
- How comprehensive should the standard be?
- How simple should it be? How extensible do we need it to be?
Choices, Trade-offs and Recommendations
Refer to Table 2.3, Table 2.4 and Table 2.5 in Chapter 2.
5.2.2 Applications of the Framework
The navigation framework helps users and developers of a distributed geo-processing infrastructure to first articulate their needs and their constraints, and then evaluate their alternatives
based on the trade-offs involved in them. Accordingly, the most appropriate solution to a given
problem depends on how that problem is positioned across the five dimensions of the framework.
Figure 5.2 illustrates how various applications can be positioned across the two dimensions of client and data characteristics. In this figure, the client is characterized as either thin or thick. The
thicker the client, the more power it has to handle and process raw geo-data. The other dimension
in Figure 5.2 is the data, which is categorized as static or real-time, depending on the frequency
with which it is updated.
CLIENT
Thin
Thick
* Mobile client (PalmPilot or
cellular phone) requesting the
nearest McDonalds' locations.
* ArcView extension for
" Mobile client tracking the
* Real time traffic information
location of dispersed cattle in a
farm.
* Mobile client subscribing to a
service that signals the nearby
presence of a movie star.
" Mobile client accessing realtime traffic conditions.
retrieving subsets of
coverages on demand.
e
system used by transportation
planners for local re-routing.
Weather forecasting.
Figure 5.2 Sample applications positioned with respect to client and data dimensions.
The examples shown in Figure 5.2 illustrate how a distributed setup can serve a range of clients and data needs. In many cases, especially for thick clients, a combination of local and distributed data is needed for typical analysis. Indeed, the value of a distributed setup to thick clients
stems from its capability to offer these clients the means to enhance their analysis by accessing distributed data on demand, and juxtaposing it on top of locally available data layers. In Figure 5.2,
ArcView is used as an example of a thick client that often requires distributed static data coverages
(such as elevation data) to be incorporated into local projects. Other thick clients, such as real-time
traffic and weather forecasting information systems, depend more on juxtaposing real-time infor-
102
mation (such as traffic conditions and cloud coverage) on top of static data layers (such as the road
network and the landuse distribution).
Figure 5.2 also provides some interesting examples of location-based services that can be used
by thin mobile clients. These range from allowing users to find the locations of nearby restaurants,
to informing them about real-time information traffic conditions or a nearby sighting of a certain
movie star. Indeed, location-based services, especially those related to recreation and transportation, are expected to be among the first and most visible applications benefitting from a distributed
geo-processing infrastructure.
5.3 Implications on Required Standards and Protocols
for Future Research
Armed with a strong understanding of the issues covered in the navigation framework, we proceed
to identify the implications of these issues on the design of standards and protocols that can support scalable, extensible and easy-to-use distributed geo-processing infrastructures. We focus on
the standards and protocols within the context of the federated architectural model (described in
Section 4.5), since this model already encompasses the other architectures discussed in Chapter 4,
and is the most promising in terms of hiding service chaining complexities from the client. These
standards and protocols will ultimately affect the design and complexity of mediating services, as
well as shape the nature and the extent of their involvement in dialog exchanges with other components of the architecture.
Our next task is to outline a fundamental set of standards and protocols that are needed for an
efficient dialog structure among the various components of the architecture. This fundamental set
of standards and protocols can then be mapped into a practical pathway that the GIS community
may follow to ensure successful, scalable and extensible implementations of distributed geo-processing infrastructures. Such a pathway will serve to highlight the efforts that deserve the most
attention in the near term, while promising the largest pay-off to the community in the long run. As
stated at the beginning of this thesis, outlining a sustainable pathway is challenging because of the
pathway's dependence on continually evolving IT and web technologies. This is where our analysis of the key issues uncovered in this thesis will prove the most valuable, as it will alleviate the
challenge of coping with the inevitable uncertainties of IT evolution.
According to our earlier analysis, data exchange and message passing standards are the cornerstones of any fundamental set of standards and protocols supporting a sustainable distributed infrastructure. In this section, we discuss some of the choices available for standardizing these
103
cornerstones in the case of GIS. We also briefly discuss the equally important catalog and metadata
standardization issues. Throughout the discussion, our main concern is maintaining a reasonable
overall level of simplicity. As inferred from the Internet discussion in Section 2.4, simplicity of
message passing protocols and exchange data structures can go a long way if they leave enough
room for extensibility. Furthermore, given the overall "no one size fits all" conclusion of previous
chapters, we expect that there will be different combinations of standards that will work for different users and applications, as covered in the navigation framework of Section 5.2.
5.3.1 Data Format and Exchange Standards
In keeping with the goal of fitting with today's evolving web technologies, an XMIL-based
standard for exchanging geo-data in a distributed environment is very promising. With XML currently at the heart of web technologies, the ongoing work on GML (see Section 3.5.1) is indeed
timely. According to Section 2.2.2, the "success" of GML depends on four factors namely, the process followed to design it, its timing with respect to other efforts in the field, the standardization
level it addresses, and the scope of functionalities it offers. If "successful", GML can have a great
impact on sharing and linking distributed geographic datasets, as well as integrating them with
other non-spatial data. It also allows GIS applications to leverage a growing number of XML tools
and technologies for data visualization (e.g., SVG and VML), data transformations (e.g., XSLT),
schema expression (e.g., XML schema and RDF), and data querying (e.g., XQL).
Following our exposure to GML in the prototyping experiment (see Chapter 3), we expect that
the extent to which GML will be successful will largely depend on the controversial issue of how
simple it ought to be set initially. Indeed, in the OGC circle, there is a tension between the view of
GML as a format that can efficiently incorporate the richness of most current data models, and the
view of GML as merely providing a means for accessing large complex databases. On one hand, if
GML is to have built in mechanisms for handling the various data models, then the interface risks
becoming too complex for simple purposes, hence limiting its utility. On the other hand, if GMIL is
set as a least common denominator for these models, it risks missing various aspects of these models, hence undermining its usefulness in certain cases.
In light of our analysis in this thesis, we anticipate that a simple, not necessarily all-inclusive
GML will prove more valuable to a wider audience of users. Indeed, in most typical cases, users
will be willing to forego some specific data model details for the benefit of effortlessly being able
to integrate their data with other datasets. In this case, GML is considered more of an exchange
104
standard than a data storage standard for GIS data. Consequently, we expect users to continue to
store their data in its current local form, and only transform it to XML as needed.
Despite its advantages, an XML-based standard still carries some drawbacks. One issue with
XML encoding is that the auxiliary XML tags describing and surrounding the actual data are often
more voluminous than the data itself. A case in point is that the GML encoding of even simple collections of geometries, as illustrated in the example below, may still require an entire page of XML
tags to fully describe them.
<GeometryCollection srsName="EPSG:4326">
<geometryMember>
<Point>
<coordinates>50.0,50.0</coordinates>
</Point>
</geometryMember>
<geometryMember>
<LineString>
<coordinates>
0.0,0.0 0.0,50.0 100.0,50.0 100.0,100.0
</coordinates>
</LineString>
</geometryMember>
<geometryMember>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>
0.0,0.0 100.0,0.0 50.0,100.0 0.0,0.0
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</geometryMember>
</GeometryCollection>
In such simple cases, XML encoding can be inefficient and cumbersome. However, it can be
argued that this is a fair price to pay for an interoperable solution, especially since XMIL data can
be easily compressed, and that parsing its content can be efficiently performed by machine parsers
that are totally transparent to users.
In order for GML to gain wide acceptance as a geo-data exchange standard, we argue that it
needs to offer ways for accommodating the exchange of raw binary data. This can be done by
packaging binary data, as is, in a GML document that only uses XML to convey metadata information about the characteristics of that data. Such an approach can be easily incorporated into the
coverage extensions of GML (currently being drafted at OGC). It remains to be seen if and how
this can be achieved without making GMIL more complex than it needs to be.
105
Given the many uncertainties surrounding GML, it is still too early to predict its future success. This is especially true considering today's broad world of other XML-based efforts, some of
which, such as VML, SVG and VRML may be used in GIS. The outcomes of these efforts may
prove sufficient to the general GIS user community, and hence bypass GML to becoming de-facto
standards for geographic data exchange.
5.3.2 Service Chaining and Data Passing
Throughout this thesis, service chaining issues have dominated our discussions as the basis for
efficient use and combination of resources in a distributed environment. It therefore follows that,
to a great extent, the shape of distributed geo-processing will depend on the technologies selected
for expressing, exchanging and executing service chains in this distributed environment. When
considering candidate technologies, the focus ought to remain on exploiting broader IT and web
technologies that exhibit an adequate degree of extensibility, and that are effective in minimizing
potentially voluminous data transfers among services and maintaining a minimal level of complexity.
For the reasons outlined in the previous section, we find XML to be a suitable technology for
expressing and exchanging service chains within distributed infrastructures. More specifically, we
argue that XML's inherent capabilities for describing nested information, coupled with its intelligent linking features (e.g., XLink and Xpointer) offer the right set of tools for achieving these purposes.
As an example, consider the interaction between the client and the mediating service in the
federated architecture depicted in Figure 4.6. In this example, the mediating service can communicate to the client a potential service chaining solution, by using XML to encode the sequence of
selected services as nested lists of (XLink) pointers to these services and their parameters. The
value of XML's linking features is even greater considering their usefulness in significantly reducing the volume of data transfer among various components of the infrastructure, by allowing these
components to exchange, when possible, pointers to data instead of that data itself (as in Figure
3.7). Figure 5.3 illustrates the data flow after applying this method of information exchange to the
example described in Section 4.2.1. The client in Figure 5.3 is shown overseeing the execution of a
chain it received as an XML document from a mediating service. In order for this scenario to work,
the mediating service and the client have to agree on a DTD for the XML encoding of nested service chains.
106
In many cases, however, the client will be incapable of handling the execution of service
chaining (see Section 4.5.3). In these cases, it will be the responsibility of the mediating service to
execute the chains and return the appropriate result to the client. Figure 5.4 shows an example of
such a client requesting an image that needs be assembled from three map servers. In this figure,
the mediating service is shown initiating the execution of the chain and returning the composite
image to the client. Instead of downloading the three constituent images and then sending them to
the mosaicking service, the mediating service in this example delivers image pointers to the mosaicking service, which in turn, uses these pointers to retrieve the images before mosaicking them.
Smart server
Request
Client
XML Response
Capabilities
.
--
Request
-..... negotiations
Data
Re-Projection
Service
Request
P
Ortholmagery
Service
Data
Request
Address Matching
Service
Data
- This figure only shows the interactions among the smart server, the client and the independent services. The catalog interactions are the same as in Figure 4.6.
- Thick lines indicate data transfer while thin lines indicate service requests.
- Data transfer is further minimized by leveraging service closure characteristics (see Section 4.3.2).
Figure 5.3 Example 1: Using XML to minimize data transfer in federated architectures.
Smart Server
Client
Map Serverl
MosaickingM
Service
Map Server2
Map Server3
Figure 5.4 Example 2: Using XML to minimize data transfer in federated architectures.
107
The examples above illustrate the usefulness of exploiting XML technologies in the implementation of geo-processing service chaining. By using such broad technologies, fewer GIS-specific capabilities and skills will be required in the process of designing and implementing
mediating services. This in turn will accelerate the rate at which they are developed and introduced
to the market, hence fueling a faster growth of distributed geo-processing. However, as discussed
in Section 4.5, certain aspects of mediating services will remain dependent on domain specific
expertise as well as on the capabilities of the services accessible to them.
5.3.3 Message Passing and Dialog Structure
In this section, we examine the message passing dialog that takes place between a mediating
service and basic services, while adhering to our general goal of minimizing the volume of transferred data. In particular, we focus our attention on the phase of the dialog in which the mediating
service queries basic services about their capabilities. As described in Section 2.3.4, the mediating
service retrieves the capabilities of a service by issuing a getCapabilities request to that service.
According to Appendix B.2, the service responds to the getCapabilities request by returning a
machine parseable listing of all interfaces supported by that service. In the case of map services,
getCapabilities additionally returns the layers of data offered along with a listing of the formats,
styles and projections supported for each layer (see example in Appendix B.2).
We argue that, although the simplicity of such a capabilities retrieval model is desirable, its
scalability suffers when the size of retrieved capabilities is large. This observation is likely to be a
major issue when dealing with aggregate map services (also known as cascading map servers
within the OGC realm1 ). Since such services report the capabilities of other services as their own,
it is not surprising that their capabilities consist of long lists of layers, with each layer accompanied by its respective long lists of projections and formats. For most layers, these lists of projections and formats will considerably overlap, resulting in large volumes of redundant information in
large capabilities files.
To give the reader a feel for how large these files can be, we introduce the example of an
OGC-compliant cascading map server, provided by Cubewerx, Inc. (www.cubewerx.com). This
service was used in the Web Mapping Testbeds to provide access to data layers from more than
twenty map servers in a dozen SRSs. The size of the resulting capabilities file was on the order of
1 MB (i.e., approximately 400 pages of XML code).
1. According to the OGC Web Mapping Testbed glossary, a cascading map server "can report the capabilities of other map servers as its own and transform layers from those map servers into different projections
and formats, even if those map servers cannot serve those projections and formats themselves".
108
It may be argued that, in comparison to the typical sizes of exchanged geographic data, the
exchange of an additional 1 MB capabilities file is tolerable. However, these capabilities can grow
exponentially as the combinations of possible layers, projections and formats quickly multiply.
The size of the capabilities will also increase with the number of map servers that are cascaded.
This is a concern in light of the expected reliance on cascading services sourcing from an even
larger pool of services in an open distributed geo-processing infrastructure. Under these circumstances, it will not be surprising to see capabilities files that are much larger than the 1 MB file in
our example. The large file sizes coupled with the expected high frequency at which capabilities
files are exchanged in the infrastructure argues for a more scalable solution.
A simple strategy for handling some of the scalability issues relies on applying a simple multistep process which begins by consolidating individual lists of projections into one single master
projection list that is free of redundancy and repetitions. Similar master lists may be created for
supported formats or styles. The second step consists of identifying subsets of these master lists
that are the most likely to be used. In response to a first query by a mediating service, only these
subsets are returned. If the mediating service is unsuccessful in satisfying a request for a layer
based on the initial set of capabilities, it then issues another request to the cascading service, upon
which the latter returns a new list of capabilities that are more specific to the layer in question (see
example in Figure 5.5 below). A similar dialog may occur between the mediating service and the
cascading service regarding other parameters of a layer such as its format or style.
B
A
1. Assume cascading service cascades four
services A,B,C and D. The circles in the diagram
represent the projections/formats that the
cascading service can provide each layer at.
D
2. In response to a first request by a mediating
service, the cascading service only returns the
projections/formats of the shaded area.
3. The mediating service uses the initial
capabilities to request layer B at projection x,
which results in an error.
C
A
B
X
B
A
4. The mediating service sends another request to
the cascading service, upon which the cascading
service returns the capabilities set of layer B, as
shown.
D
C
Figure 5.5 Reducing redundancy in capabilities retrieval.
We note that a simple sequencing strategy, such as the one described above, indeed succeeds in
avoiding redundancy and repetitions in cascading service capabilities, especially when the under-
109
lying services have similar characteristics. However, this strategy fails to tackle the more serious
problem of potential exponential growth of the capabilities.
We argue that, in order to ensure scalability, a more sophisticated dialog structure is needed. It
would replace the current process of learning about service characteristics by requesting and parsing their capabilities files. In the new dialog structure, a mediating service may send successive
inquires (with yes/no answers) to a service to determine its fitness for a particular purpose. The
outcome of each inquiry is used by the mediating service to refine and focus the next inquiry.
Ensuring the efficiency and scalability of such a sophisticated dialog structure would require an
additional level of standardization that is expected to take time to evolve.
5.3.4 Other Implications
The above discussion on data exchange and message passing issues mainly covers the interactions between the client, the mediating service(s), and the individual services in the federated
architecture. In order to complete the discussion about the implications of our research on required
standards and protocols, this section briefly recounts some of the standardization aspects related to
the remaining component of the architecture, namely the catalog. With the catalog at the center of
the federated architecture, these aspects include:
- Standardizing the categorization of services in catalogs as to facilitate the search for services
according to the category they belong to. At the simplest level, the categorization may involve
grouping services into basic types such as map services, feature services, coverage services and
geo-processing services. As more services become available, it will be necessary to subdivide
these categories further according to the geographic coverage of data servers, and the specific
functionality and application domain of geo-processing services. In Section 4.4.3, we also found it
useful to categorize services according to their provider, performance, price offerings, etc.
- Standardizing the catalog interfaces which allow users and other components to query and
retrieve metadata as well as add and update service entries. The OGC has been actively involved in
this area, and has been working with the FGDC and ISO/TC211 on Catalog Services Interfaces
and Abstract Specifications (see http://www.opengis.org/techno/request.htm). For the same reasons outlined in Section 5.3.1, and especially given the advances in XML Schema and XQL, XML
is expected to play an important role in standardizing the catalog interfaces.
- Standardizing the return format and structure of query results. These results may consist of
one or more addresses of services that may be sorted according to their relevance to the query criteria. The text-based nature of XML as well as its indirection capabilities make XML a good
choice for representing such information in the form of a list of pointers to services.
The above discussion tackles catalog standardization issues from a purely technical interoperability perspective (see Section 2.2.1). However, it has also been established that effective sharing
and re-use of data among organizations also requires a minimal level of semantic interoperability
110
[52]. It is for this reason that the specialization of mediating services as well as catalogs is needed
in the distributed infrastructure. Addressing semantic interoperability also calls for organizations
in various industries to team up and establish standard vocabularies (in the form of XML DTDs
and Schemas) in order to facilitate data sharing and re-use within and outside their respective
industries.
It remains to be seen how and when such issues will be resolved, and how they will affect the
evolution of distributed GIS.
5.4 The Future of GIS Technologies and Markets
In summary, this research is motivated by the promising prospects of distributed GIS computing.
With the growing amount of data available and the rising interest in using that data in a variety of
settings, accompanied by the advances in web, e-commerce and mobile technologies, the market is
clearly gravitating towards a more distributed model of GIS. The question is no longer about
whether a distributed geo-processing infrastructure is appropriate, but rather about how it can be
realized, and what issues need to be accounted for in order for it to materialize and grow in a scalable and extensible fashion.
In this section, we use our understanding of the issues analyzed in this thesis to speculate on
future dynamics of the GIS marketplace. We also highlight potential challenges and opportunities
that are likely to affect the ultimate shape and growth path of distributed geo-processing.
5.4.1 Dynamics of the Future GIS Marketplace
The unbundling of GIS systems into independently-provided interoperable components, and
the delivery of subsets of GIS data to users on demand will lead to significant changes in the GIS
marketplace. Figure 5.6 outlines a potential value chain for the future GIS marketplace.
In this new distributed environment, the private sector as well as the public sector at the local,
state and federal levels will all likely contribute in establishing and maintaining a national GIS
infrastructure. The different players in the value chain will share different responsibilities according to their expertise. For instance, as discussed in Section 4.4.3, governmental agencies are well
positioned to offer and maintain public data covering their areas of jurisdiction. National agencies
such as NASA and NIMA (see Section 2.1.3) can also support and leverage such a distributed
infrastructure by providing access to their data via interoperable interfaces, and using these interfaces for accessing each other's data. In the private arena, satellite imagery providers are likely to
111
follow an e-commerce model for providing users with on-demand access to their huge repositories
of data.
Integrated/cascaded
can
be
customized
for
individual clients
Frequently refreshed
F
tlrgenumefrshservices,
data, large number of
small transactions
Infrastructure
Providers
Data
Producers
Service
Providers
tt
MCI, AT&T
Integrators
Service
Brokers
t
Specialized
services
for
niche markets, most rely on
integrators
and
service
brokers to distribute their
products
Search engine-like services,
enable clients to search and
locate services, and mix and
match them to solve their
problem
Figure 5.6 Potential value chain for the future GIS marketplace.
With the unbundling of GIS, it will not be necessary for players to build comprehensive systems in order to gain a share of the market. The new environment will open the door to small niche
players to enter this market with application specific offerings that leverage their understanding of
particular industries or processes. In Section 4.5.3, we saw that the need for mediating services to
coordinate service chaining will provide huge market entry opportunities for these new players.
Nonetheless, these opportunities will be limited by the availability of data/service repositories and
catalogs in the market, unless the new players provide their own suites of these services. However,
even though these players possess the expertise in their respective arenas, they will not necessarily
be interested in supporting the requisite back-end services. Instead, it is more likely that they will
wait for enough services to become available on the market, and select partners from the players
that provide them.
The above dynamics and the new player opportunities particularly apply in the case of location-based services. Given the expected demand on location-based services and the rapid expansion of the wireless market, location-based services have the potential of becoming the killer
applications that will drive the construction and delivery of various components of federated geoprocessing infrastructures. Indeed, we are currently witnessing the emergence of many early versions of such location-based services on the market (see Figure 5.2). However, the underlying
architectures supporting such services many not necessarily scale well when more complex ser-
112
vices are needed, or when these services require connecting among distributed resources that are
not easily put together in one place. The need for addressing these more complex requirements
opens the door to independent service providers for supplying the data, the services as well as the
mediating components to coordinate among various services. Furthermore, new opportunities may
be available for some service providers to target niche markets in the cases when the back-end services are expensive, when service chaining requires specific domain expertise, or when the data
provided is sensitive to local context and subcultures.
Finally, in terms of the reaction of traditional GIS systems providers in the face of the new
competition, we expect them to adapt their business models by offering access to components of
their systems through portal-like applications. Until now, the traditional players have been intentionally slow at aggressively developing applications for thin client applications in order to protect
their systems. In order to compete, these players will leverage their established brand names as
well as their connections with their current customers. However, in order to maintain their current
investments in their clients, we anticipate that the traditional players will tune their services to better perform when coupled with their own clients. As mentioned in Section 2.5.3, a distributed service model will not fully replace the core GIS systems, traditionally supplied by the big players.
Those core systems will always be needed for building new GIS data.
As shown in this thesis, the nature of distributed geo-processing in the future depends on many
factors. In the following section, we discuss some of the challenges that may still need to be
addressed, and some of the opportunities that are likely to accelerate the growth of distributed geoprocessing.
5.4.2 Challenges and Opportunities
The future of distributed geo-processing holds many opportunities that are shaped by the
issues discussed in Section 5.3. From the technical perspective, the eventual shape and growth
path of distributed geo-processing will be determined by the outcome of the GML standardization
process, the extensibility and scalability of data and message passing protocols employed in service chaining, and their impact on the design and complexity of catalogs and mediating services.
On one hand, the future of distributed geo-processing will depend on the long term viability of
the coopetition model exercised within OGC, and the outcome of its testbed approach for standardizing web mapping (see Section 2.3.4). On the other hand, the construction and proliferation
rates of distributed geo-processing will depend on the evolution of broader IT and web technologies (see Sections 2.5 and 2.3.5), and the extent to which they can be leveraged in the GIS arena.
113
For example, many of the technologies used in the ASP field, especially those related to billing,
security and authentication, may be directly applicable in the case of GIS. Similarly, distributed
geo-processing may also benefit from the increasing interest within the software engineering
world to adapt current object-oriented design tools (such as UML, the Unified Modeling Language) for designing XML Schemas [12]. With the growing interest in using UML for modeling
and automating the flow of XML data and processing, a new generation of XML modeling technologies is on the horizon. These technologies can potentially be used in mediating services for
providing dynamic service chaining to their clients. The use of such general purpose tools will
undoubtedly accelerate the development of mediating services, and consequently, the evolution of
distributed geo-processing.
Finally, from the users' perspective, adopting and using the distributed geo-processing model
will depend on the public and commercial availability of reliable interoperable geographic information services. Furthermore, as discussed in Section 2.1, the distributed geo-processing will be
more attractive to users as more bandwidth becomes available via technological innovations such
as the Internet II and the Next Generation Internet [76]. However, although these technologies
promise to dramatically increase bandwidth over the years, the market of wireless location-based
services will continue to be constrained by practical limitations on bandwidth, performance and
connectivity [61]. Given the upside potential of the location-based services market (see Section
2.1.6 and Section 5.4.1), these practical issues should remain a priority.
In summary, it is evident, from both the business and technical perspectives, that the future of
distributed geo-processing is indeed promising. It remains to be seen how and when it will come
about.
114
a
Traditional GIS Model
a
U U WEWEUEWEE~
.-
a
Identify and load needed data layers for analysis
v ----
--
ay-
a
S
--
---------
O0
U
F
C
j
Locally prepare data for analysis
3
9 Locally Reproject layer I into the desired coordinate System
C-,'
* Locally Extract needed coverage from layer 2
Extract
* Locally Address match table 1
Address Match->
9 Extract and mosaic orthophotos from large locally stored tiles
Extract & Mosaic
I
A1
f1#
Begin analysis
ft
Appendix B
Map and Capabilities Request Specifications
B.1 Map Interface
The Map interface is designed to provide clients of a map server with pictures of maps, possibly
from multiple map servers. Upon receiving a Map request, a map server must either satisfy the
request or throw an exception in accordance with the exception instructions. Table B. 1 provides an
overview of the Map request parameters followed by an example on how it is used. For more
information on the latest specifications or updates, refer to the full specification document available at www.opengis.org/techno/specs/.
Parameter
Description
http://server-address/path/script?
URL prefix of map server.
WMTVER = 1.0.0
Request version, required.
REQUEST = map
Request name, required.
LAYERS
=
layer-list
Comma-separated list of one or more map layers, required.
STYLES
=
style-list
Comma-separated list of one rendering style per requested layer,
required. Example of styles: points, contours, reference.
SRS = srsidentifier
Spatial Reference System (a text parameter that names a horizontal coordinate reference system code), required. Two namespaces
are defined: EPSG and AUTO. Map servers advertise their SRS in
their capabilities documents.
BBOX = xmin,ymin,xmax,ymax
Bounding box corners in SRS units, required.
WIDTH = outputwidth
Width in pixels of map picture, required.
HEIGHT = output-height
Height in pixels of map picture, required.
FORMAT = output-format
Output format of map, required. The formats are divided into four
basic groups: Picture formats (GIF, JPEG;TIFFPNGetc.), Graphic
Element formats (WebCGM, SVG), Feature formats (GML), other
formats (MIME,INIMAGE,etc.).
TRANSPARENT = trueorfalse
If TRUE, the background color of the picture is made transparent
if the image format supports transparency. Default = FALSE.
BGCOLOR= color-value
Hex value for the background color, optional.
EXCEPTIONS =exceptionformat
Format in which exceptions are to be reported by the server. XML
document or INIMAGE (default = INIMAGE where the error
message is returned graphically to the user), optional.
Vendor-specific parameters
Table B.1: The Map request.
116
An example of the use of the Map request would be:
http: / /b-maps .com/map.cgi?WMTVER=1. 0.0
&REQUEST=map
&SRS=EPSG%3A4326
&BBOX=-97.105,24.913,78.794,36.358
&WIDTH=560
&HEIGHT=350
&LAYERS=BUILTUPA
&STYLES=0XFF8080
&FORMAT=PNG
&BGCOLOR=0xFFFFFF
B.2 Capabilities Interface
The capabilities interface is designed to provide clients of map servers with a machine-parseable
listing (an XML document) of what interfaces a map server supports, what map layers it can serve,
what formats it can serve them in, etc. Below is a simplified version of the DTD for version 1.0.0.
For the full version and the latest updates, refer to http://www.digitalearth.gov/wmt/xml/.
<! ENTITY % KnownFormats
GML.1
I
WMSXML
" GIF
I
JPEG
GML.2
I
GML.3
I MIME
I
PNG
j
I
WebCGM I SVG
WBMP
INIMAGE
I TIFF
I
GeoTIFF
I PPM
I
BLANK
>
The Service element provides metadata for the service as a whole. -- >
<!ELEMENT Service (Name, Title, Abstract?, Keywords?, OnlineResource, Fees?,
AccessConstraints?) >
<!--
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
Name (#PCDATA) >
Title (#PCDATA) >
Abstract (#PCDATA) >
Keywords (#PCDATA) >
OnlineResource (#PCDATA)>
Fees (#PCDATA)>
AccessConstraints (#PCDATA)>
<!ELEMENT Capability
(Request, Exception?, VendorSpecificCapabilities?, Layer?) >
<!-- Available WMT-defined request types are listed here. -- >
<!ELEMENT Request (Map I Capabilities I FeatureInfo)+ >
<!ELEMENT Map (Format, DCPType+)>
<!ELEMENT Capabilities (Format, DCPType+)>
<!ELEMENT FeatureInfo (Format, DCPType+)>
<!ELEMENT DCPType (HTTP) >
<!-- Available HTTP request methods. --- >
<!ELEMENT HTTP (Get I Post)+ >
-- >
<!-- HTTP request methods.
<!ELEMENT Get EMPTY>
<!ATTLIST Get onlineResource CDATA #REQUIRED>
<!ELEMENT Post EMPTY>
<!ATTLIST Post onlineResource CDATA #REQUIRED>
<!-- Available formats.
Not all formats are relevant
<!ELEMENT Format ( %KnownFormats; )+ >
<!ELEMENT GIF EMPTY> <!-- Graphics Interchange Format
<!ELEMENT JPEG EMPTY> <!-- Joint Photographics Expert
<!ELEMENT PNG EMPTY> <!-- Portable Network Graphics --
117
to all requests.-->
-- >
Group -- >
>
<!ELEMENT PPM EMPTY> <!-<!ELEMENT TIFF EMPTY> <!--
Portable PixMap
<!ELEMENT GeoTIFF EMPTY> <!--
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
-- >
Tagged Image File
Format -- >
Geographic TIFF -- >
WebCGM EMPTY> <!-- Web Computer Graphics Metafile -- >
SVG EMPTY> <!-- Scalable Vector Graphics -- >
WMS_XML EMPTY> <!-- eXtensible Markup Language -- >
GML.1 EMPTY> <!-- Geography Markup Language, profile 1 -- >
GML.2 EMPTY> <!-Geography Markup Language, profile 2 -- >
GML.3 EMPTY> <!-- Geography Markup Language, profile 3 -- >
<!ELEMENT WBMP EMPTY> <!--
Wireless Access
<!ELEMENT MIME EMPTY> <!--
Multipurpose Internet Mail Extensions -- >
Protocol
(WAP) Bitmap -- >
<!ELEMENT INIMAGE EMPTY> <!-- display text in the returned image -- >
<!ELEMENT BLANK EMPTY> <!-- return an image with all pixels transparent if supported by the image format-->
<!-- An Exception element indicates which output formats are supported for
reporting problems encountered when executing a request.
<!ELEMENT Exception (Format)>
<!ELEMENT VendorSpecificCapabilities (your stuff here) >
( Name?, Title, Abstract?, Keywords?, SRS?,
LatLonBoundingBox?, BoundingBox*, DataURL?,
Style*, ScaleHint?, Layer* ) >
<!ATTLIST Layer queryable (0 1 1) "0" >
<!ELEMENT SRS (#PCDATA) >
<!ELEMENT LatLonBoundingBox EMPTY>
<!ATTLIST LatLonBoundingBox
minx CDATA #REQUIRED
miny CDATA #REQUIRED
maxx CDATA #REQUIRED
maxy CDATA #REQUIRED>
<!ELEMENT BoundingBox EMPTY>
<!ATTLIST BoundingBox
SRS CDATA #REQUIRED
minx CDATA #REQUIRED
miny CDATA #REQUIRED
maxx CDATA #REQUIRED
maxy CDATA #REQUIRED>
<!ELEMENT Layer
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ATTLIST
DataURL (#PCDATA) >
Style ( Name, Title, Abstract?, StyleURL? ) >
StyleURL (#PCDATA) >
ScaleHint EMPTY>
ScaleHint min CDATA #REQUIRED max CDATA #REQUIRED>
An example capabilities document would be:
<WMTMSCapabilities version="1.0.0" updateSequence="0">
<Service>
<Name>GetMap</Name>
<Title>Acme Corp. Map Server</Title>
<Abstract>Contact: webmaster@wmt.acme.com.</Abstract>
<Keywords>bird roadrunner ambush</Keywords>
<OnlineResource>http://hostname:port/path/</OnlineResource>
<Fees>none</Fees>
<AccessConstraints>none</AccessConstraints>
</Service>
<Capability>
118
<Request>
<Map>
<Format>
<SGI />
<GIF />
<JPEG />
<PNG />
<WebCGM />
<SVG />
</Format>
<DCPType>
<HTTP>
<Get onlineResource="http://hostname:port/path/mapserver.cgi" />
<Post onlineResource="http://hostname:port/path/mapserver.cgi" />
</HTTP>
</DCPType>
</Map>
<Capabilities>
<Format>
<WMSXML />
</Format>
<DCPType>
<HTTP>
<Get onlineResource="http://hostname:port/path/mapserver.cgi" />
</HTTP>
</DCPType>
</Capabilities>
</Request>
<Exception>
<Format>
<BLANK />
<WMS_XML />
</Format>
</Exception>
<Layer>
<Title>Acme Corp. Map Server</Title>
<SRS>EPSG:4326</SRS> <!-- all layers are available in at least this SRS -- >
<Layer queryable="O">
<Name>wmtgraticule</Name>
<SRS> EPSG:4326 </SRS>
<SRS> EPSG:26986 </SRS>
<Title>Alignment test
grid</Title>
<Abstract>The WMT Graticule is a 10-degree grid suitable for testing alignment among Map Servers.</Abstract>
<Keywords>graticule test</Keywords>
<LatLonBoundingBox minx="-180" miny="-90" maxx="180" maxy="90" />
<Style>
<Name>on</Name>
<Title>Show test grid</Title>
<Abstract>The "on" style for the WMT Graticule causes that layer to be
displayed.</Abstract>
</Style>
<Style>
<Name>off</Name>
<Title>Hide test grid</Title>
<Abstract>The "off" style for the WMT Graticule causes that layer to be
hidden even though it was requested from the Map Server.
Style=off is the same
as not requesting the graticule at all.</Abstract>
</Style>
119
</Layer>
<Layer queryable="0">
<Name>ortho</Name>
<Title>MassGIS half-meter 1:5000 orthophoto series</Title>
<Abstract>Panchromatic imagery mosaics for the Boston metropolitan area.
Ground resolution: 0.5m. Photo dates range from 1992 to 1995.</Abstract>
<Keywords>Boston Massachusetts MassGIS orthophoto</Keywords>
<SRS>EPSG:26986</SRS>
<SRS>EPSG:26930</SRS>
<SRS>EPSG:26917</SRS>
<SRS>EPSG:4326</SRS>
<LatLonBoundingBox minx="-71.634696" miny="41.754149" maxx="-70.789798"
maxy="42.908459 '/>
<BoundingBox SRS="EPSG:26986" minx="189000" miny="834000" maxx="285000"
maxy="962000" />
<Style>
<Name>Default</Name>
<Title>The only style</Title>
<Abstract>The only style for this imagery series.</Abstract>
</Style>
<ScaleHint min="0.05" max ="500" />
</Layer>
120
Appendix C
Projection and Re-projection
C.1 Projection Surfaces
Regular Conic
Regular Cylindrical
Polar Azimuthal
(planle)
Transverse Cylindrical
Cylindrical
Oblique Azimuthal
(plane)
Figure C.1 Sample Projection Surfaces [89].
121
C.2 Types of Projections
Projections have been classified according to the characteristics of the map that they maintain:
area, shape and scale [89]. For instance,
- Equi-areal projections maintain the area proportions of a map, i.e. two regions of the same
size the projected map have the exact same area of the Earth.
- Conformal projections maintain the shape characteristic, relative local angles about every
point on the map are shown correctly.
- Equidistant projections maintain a correct scale of the map whereby they ensure that the
map contains one or more lines along which the scale remains true.
C.3 Interpolation/Resampling Methods
There are several methods used in the re-projection interpolation step to determine the brightness
value of each pixel in the new projected image:
- The nearest neighbor method assigns the original pixel value that is nearest to (xO,yO) to
(x,y). By doing so, this method does not produce new pixel values. Instead, it only uses the values
that were present in the original image. This method is commonly used because it is very simple
and hence computationally faster than other methods. It however suffers from round-off errors and
creates geometric discontinuities in the output map because it doesn't produce intermediary pixel
values. In general, for visual purposes, these discontinuities are often negligible.
- The bilinear interpolation algorithm uses the weighted average of the four pixels that surround the point (xO,yO) to estimate the brightness of (x,y). The advantage of this method is that it
produces a smoother and more continuous image. It is however slower because it involves more
computations, and it alters the original pixel values, creating a problem for spectral pattern recognition analysis.
- The cubic interpolation method produces the sharpest and smoothest image by using the
sixteen pixels surrounding the original point. Its drawback is that is more computational as it takes
on average twice as much computation compared to the bilinear method. The cubic interpolation
algorithm uses the weighted average of the sixteen pixels that surround the point (xO,yO) to estimate the brightness of (x,y). The advantage of this method is that it produces a smoother and more
continuous image. It is however slower because it involves more computations, and, like the bilinear method, it also alters the original pixel values, creating a problem for spectral pattern recognition analysis.
C.4 Prototype Capabilities Summary
As mentioned in Chapter 3, the use of ArcInfo as the back-end server for the provision of the prototype's re-projection capabilities imposed some constraints on the range of functionality of the
final product. For instance, the formats supported by our prototype were restricted to those supported by ArcInfo (bmp, tif, jpg and gif). For the same reason, the geo-referencing information
types handled by the prototype were limited to the world file and the GeoTIFF formats. In addition, the prototype inherited ArcInfo's inability of handling the cases when the zoom factors are
not identical in the x and y direction, or when the target projection is in the latitude/longitude coordinate system.
In terms of projections, a subset of the EPSG codes were supported to facilitate the chaining of
the re-projection service with WMT compliant map servers, a majority of which use the EPSG
identifiers. However, although seemingly facilitating the representation of projections, we have
found that the EPSG codes are not yet widely used by GIS vendors such as ESRI (even though
ESRI is a participant in the WMT testbeds). Consequently, part of the implementation effort was
122
dedicated to creating translation tables between those codes and the internal SRS representations
in ArcInfo, and later in a script using the proj utilities. The SRSs supported along with their equivalent parameters in ArcInfo are shown in Table C. 1.
EPSG code
Projection
Datum
Units
Fipszone
Zone
4267
geographic
nad27
dms
4269
geographic
nad83
dms
4326
geographic
wgs84
dms
26986
stateplane
nad83
meters
2001
26987
stateplane
nad83
meters
2001
26786
stateplane
nad27
feet
2002
26787
stateplane
nad27
feet
2002
32030
nad27
feet
3800
32130
nad83
feet
3800
obgm
meters
26930
nad83
meters
102
26730
nad27
feet
102
26717
nad27
meters
17
26917
nad83
meters
17
32617
wgs84
meters
17
26718
nad27
meters
18
26918
nad83
meters
18
32618
wgs84
meters
18
26719
nad27
meters
19
26919
nad83
meters
19
32619
wgs84
meters
19
27700
greatbritain-grid
Table C.1
EPSG projections supported by re-projection prototype.
C.5 Re-projection Approximation
During the implementation phase, we noticed that re-projection using ArcInfo was both slow and
expensive in terms of computational resources. At the same time, we observed that the resultant reprojected versions of images covering small areas appeared to be rotated and/or scaled versions of
the original image. Indeed, as illustrated in Figure C.2 and Figure C.3, parallel lines are maintained
through certain projections of these areas. This suggested that a linear approximation of the re-pro1. The extent to which these approximations work remains to be determined.
123
jection transformation is possible in these case.
Given this observation, the objective of the approximation is articulated as follows: Given
three non-collinear points of the original image and their exact re-projected coordinates in a new
coordinate system, the transformation matrix needs to be determined and decomposed into translation, rotation and scaling components. The exact re-projected coordinates of the initial non-collinear points can be found using the proj utility. After determining the angle of rotation and the
scaling factor, simple tools (such as pnmscale and pnmrotate) are used to transform the entire
image. The examples shown next illustrate the development and application of this approximation
on the re-projection of a 500x500 pixels image from Mass State Plane (Mass Mainland) to Nad83
Lat/Lon (EPSG code=4269), Nad83 Alabama west (EPSG code=26930) and Nad83 UTM zone 17
(EPSG code=26917). corresponds to the values shown in the following two figures.
- - - - -
902800
--
902600
902400902200_
902000.
901800
901600
901400
901200
236000
236500
237000
237500
238000
Figure C.2 A rectangle in Mass State Plane Coordinates.
-71.06 -71.06 -71.06 -71.05 -71.05 -71.05 -71.05 -71.05 -71.04 -71.04
Figure C.3 The reprojected rectangle in Lat/Lon, indicating that parallel lines
were maintained through the projection.
124
Mass X
Mass Y
Lon
Lat
236407
902490.36
-71.057965
42372968
237587.44
902590.36
-71043633
42.372912
237587.44
901402.36
-71.04371
42.362217
236407
901402.36
-71.05804
42.362273
236997.22
902590.36
-71.050799
42.37294
237587.44
901996.36
-71.043672
42.367564
236997.22
901402.36
-71.050875
42.362245
236407
901996.36
-71.058002
42.36762
Table C.2 Mass State plane points projected to Lat/Lon.
C.5.1 Determining the Transformation Matrices for the Approximation
In this example, we show the steps followed to determine the approximation parameters in reprojection case from Mass State Plane to Lat/Lon. First, three non-collinear points are selected and
their projected equivalents in the target projection are calculated. In order to simplify the decomposition of the final transformation matrix into its rotation, translation and scaling components,
homogeneous coordinates are used with a z = 1. Below, matrix X and Y contain the coordinates of
the three points in the target and source projections respectively.
-71.057965 42.372968 1]
X = -71.043633 42.372912 11
_-71.04371
42.362217
236407 902590.36 1
Y = 237587.44 902590.36 1
23 7 5 8 7 .4 4 901402.36 1
1_
If T is the transformation matrix (transforming Y into X), then
X=Y * T
implies T = Inv(Y) * X (The inverse of Y exists since, as a result of the noncollinearity of the three points, its determinant is non zero)
1.21412x10
T = Inv (Y) * X =
6.48148x10 0
-73.98673928
-05 4.74399x10 08 3.86784x10 20
9.00253x0-0
34.25859062
-1.0842x10 1
1
The structure of the above transformation matrix indicates that the transformation is perspective-free (the last column is 0 0 1), and can hence be decomposed into a translation, rotation and
scaling component.
all a12 0
1 0 0
Cos 0 sinO 0
T= a21 a22 0 = 0 1 0 X -sino cos 0 j
_a31 a32 1_
_Tx Ty I_ _ 0
0 1_
Sx
0 0
0 Sy 0
[0 0 1
125
T=
-s-
Sxcos
-SxsinO
Sx( Txcos - Ty sin)
Sy sin0
SycosO
0
0
Sy(Txsin0 - TycosO) 1_
Solving, we get
Sx = 0.1214x10 4
Tx = -6113982.409
Sy = 0.9x10 5
Ty = 3773223.926
0 = 0.30586509 degrees
Error (sqrt of sum of square distances) = 2.82842e -06
For comparison purposes, the images obtained using this approximation as well as those
obtained using actual re-projection in ArcInfo are attached to this appendix The same process is
applied to the projection from Mass State Plane to Alabama West and UTM zone 17 as shown
below.
Mass X
Mass Y
Alabama X
Alabama Y
236407
902490.36
1955680.61
1505879.265
237587.44
902590.36
1956865.815
1506108.656
237587.44
901402.36
1957096.676
1504915.847
236407
901402.36
1955911.418
1504686.495
236997.22
902590.36
1956273.212
1505993.934
237587.44
901996.36
1956981.221
1505512.19
236997.22
901402.36
1956504.046
1504801.144
236407
901996.36
1955796.072
1505282.835
Table C.3 Mass State points projected to Alabama West State X,Y.
1.00403663
T= Inv (Y) * X =
0.19432669 3.86784x10 20
-0.19432744 1.00404798 -1.0842x10~
[1893717.397 553695.0472
1
Sx = 1.0226694
19
_
Tx = 1920882.015
Ty = 179689.0226
Sy = 1.0226804
0 = 10.95394002 degrees
Error (sqrt of sum of square distances)
126
=
0.10894959
Mass X
Mass Y
UTM X
UTM Y
236407
902490.36
1318939.874
4739113.671
237587.44
902590.36
1320122.167
4739247.138
237587.44
901402.36
1320256.488
4738057.319
236407
901402.36
1319074.152
4737923.877
236997.22
902590.36
1319531.02
4739180.379
237587.44
901996.36
1320189.297
4738652.169
236997.22
901402.36
1319665.319
4737990.572
236407
901996.36
1319007.065
4738518.724
Table C.4 Mass State points projected to UTM zone 17 X,Y.
1.00157022
T = Inv (Y) * X =
0.11306506 3.86784x10 20
1.001531 -1.0842x101
[1184212.274 3808412.052
-0.11306415
Sx = 1.0079318
Tx
Sy = 1.0078929
Ty
0 = 6.4406749 degrees
=
=
19
_
1591338.21
3622940.495
Error (sqrt of sum of square distances)
=
0.10294927
C.5.2 A Simpler Approach
Given that raster images are basically arrays of pixels (which are at the end coordinate-free), a
simpler way of obtaining the rotation angle and the scaling factor (given that the translation only
applies to the geographic coordinates and has no meaning on a pixel basis) is as follows:
Original Image
Reprojected Image
Lx2
Lyl
X, I
Yrb
Ly2
Lx1
Ylb
Xlb
127
'Xrb
Which leads to:
0 = ArcTan (Yrb
-
Yxb)
(Xrb- Xlb)
Sx= Lx2/Lxl
Sy = Ly2 / Ly2
C.5.3 Sample Images
Original Image
128
Lat/Lon Approximation
Lat/Lon reprojection using ArcInfo
-ftm
- OW7-
-
Alabama West approximation
0
Alabama West reprojection using ArcInfo
UTM 17 approximation
UTM 17 using ArcInfo
References
[1]
Abel, D. & Ooi, B. & Tan, K. & Tan, S. Towards Integrated Geographical Information Processing. International Journal of Geographical Information Science, Vol. 12, No. 4, June
1998, pp. 353-371.
[2]
Adam, N. & Gangopadhyay, A. DatabaseIssues in GeographicInformation Systems. Kluwer
Academic Publishers 1997.
[3]
Adhikar, Richard. ASP's Pay Per Use Tool. PlanetIT,02/11/00. Available at http://www.planetit.com/techcenters/docs/internet_&_intranet/technology/PIT20000204S00 19.
[4]
Alameh, Nadine S. Internet-Based Collaborative Geographic Information System. Master
Thesis Submitted to the Department of Urban Studies and Planning and the Department of
Civil and Environmental Engineering at MIT. June 1997.
[5]
Arctur, D. & Hair, D. & Timson, G & Martin, E.P. & Fegeas, R. Issues and Prospects for the
Next Generation of the Spatial Data Transfer Standard (SDTS). InternationalJournal of GeographicalInformation Science, Vol. 12, No. 4, June 1998, pp. 403-425.
[6] ASP Industry Consortium Web Page.Available at http://216.64.7.75.
[7] Barbara, D. & Jain, R. & Krishnakumar, N. (Editors) Databasesand Mobile Computing. Kluwer Academic Publishers 1996.
[8] Benjamin, Suzan. Using Digital Orthophotos in a Desktop GIS. ACSM-ASPRS Annual Convention Papers,Vol. 4, 1990, pp. 60-67.
[9] Bernstein, Philip. Middleware: A Model for Distributed System Services. Communicationsof
the ACM, Vol. 39, No. 2, February 1996, pp. 86-98.
[10] Berry, John. Should You Use an Application Service Provider? PlanetiT,10/13/99. Available
at http://www.planetit.com/techcenters/docs/enterprise-apps/technology/PIT19991012S0062.
[11] Bishr, Yaser. Overcoming the Semantic and Other Barriers to GIS Interoperability. International Journalof GeographicalInformation Science, Vol. 12, No. 4, June 1998, pp. 299-314.
[12] Booch, G & Christerson, M. & Fuchs, M. & Koistinen, J. UML for XML Schema Mapping
Specification. Rational Website, December 1999. Available at http://www.rational.com/media/
uml/resources/media/umlxmlschema33.pdf.
[13] Bouguettaya, A. & King, R. Large Multidatabases: Issues and Directions. Interoperable
DatabaseSystems (DS-5). Edited by Hsiao, D.K. & Neuhold, E.J. & Sacks-Davis, R. 1993, pp.
55-68.
[14] Bradenburger, Adam. Coopetition. New York: Doubleday, 1996.
132
[15] Branscomb, L. & Kahin, B. Standards Processes and Objectives for the National Information
Infrastructure. Standards Policyfor Information Infrastructure.Edited by Kahin, B. & Abbate,
J. MIT Press 1995.
[16] Bray, Olin. DistributedDatabaseManagement Systems. Lexington Books 1982.
[17] Brodie, Michael. The Promise of Distributed Computing and the Challenges of Legacy Information Systems. InteroperableDatabase Systems (DS-5). Edited by Hsiao, D.K. & Neuhold,
E.J. & Sacks-Davis, R. 1993, pp. 1-32.
[18] Buehler, K. & McKee, L. The OpenGIS Guide: Introduction to InteroperableGeoprocessing
and the OpenGIS Specification, Third Edition. Open GIS 1998.
[19] Cargill, Carl F. Open Systems Standardization:A Business Approach. Prentice Hall 1997.
[20] Carroll, Michael L. Cyberstrategies:How to Build an Internet-Based Information System.
Van Nostrand Reinhold 1996.
[21] Cassettari, Seppe. Introduction to IntegratedGeo-Information Management. Chapman & Hall
1993.
[22] Cheung, T. & Fong, J. & Siu, B. (Editors). DatabaseReengineering and Interoperability.Plenum Press 1996.
[23] Chorafas, Dimitris. Network Computers Versus High PerformanceComputing. Cassell 1997.
[24] Clement, G & Larouch, C. & Morin, P. & Gouin, D. Interoperating Geographic Information
Systems Using the Open Geospatial Datastore Interface (OGDI). Interoperating Geographic
Information Systems. Edited by Goodchild, M. & Egenhofer, M. & Fegeas, R. & Kottman, C.
Kluwer Academic Publishers 1999, pp. 283-300.
[25] Coleman, D. & Mclaughlin, J. Information Access and Network Usage in the Emerging Spatial Information MarketPlace. Journalof the Urban and Regional Information Systems Association (URISA), Vol. 9, No. 1, pp. 8-19, Spring 1997.
[26] Cook, Melissa A. Building Enterprise Information Architectures. Prentice Hall 1996.
[27] Cranston, C. & Brabec, F. & Hjattason, G. Adding an Interoperable Server Interface to a Spatial Database: Implementation Experiences With OpenMap. Interoperating GeographicInformation Systems INTEROP'99. Proceedings of Second International Conference. Zurich,
Switzerland. pp. 115-128.
[28] Cuthbert, Adrian. OpenGIS: Tales from a Small Market Town. Interoperating Geographic
Information Systems INTEROP'99. Proceedings of Second International Conference. Zurich,
Switzerland. pp. 17-28.
[29] Daniel, Larry. Challenges for the Next Millennium. Business Geographics,January 2000, pp.
12-15.
133
[30] Deckmyn, Dominique. Microsoft Unveils Internet Service Strategy. Computerworld, June 22,
2000.
[31] Devogele, T. & Parent, C. & Spaccapietra, S. On Spatial Database Integration. International
Journalof GeographicalInformation Science, Vol. 12, No. 4, June 1998, pp. 335-352.
[32] Doyle, A. & Dietrick, D. Building a Prototype OpenGIS Demonstration from Interoperable
GIS Components. InteroperatingGeographicInformation Systems INTEROP'99. Proceedings
of Second InternationalConference. Zurich, Switzerland. pp. 139-149.
[33] Doyle, Allan. Web Map Server Interface Specification: OpenGIS Document 99-077r6. 03/16/
00.
[34] Evans, John. Infrastructuresfor Sharing Geographic Information Among Environmental
Agencies. Ph.D. Dissertation Submitted to the Department of Urban Studies and Planning.
June 1997.
[35] Evans, John. Interoperable Web-Based Services for Digital Orthophoto Imagery. Photogrammetric Engineeringand Remote Sensing, Vol. 65, No. 5, May 1999, pp. 567-57 1.
[36] Elmagarmid, A. & Rusinkiewicz, M. & Sheth, A. (Editors). Management of Heterogeneous
and Autonomous DatabaseSystems. Morgan Kaufmann Publishers, Inc. 1999.
[37] Finch, I. & Small, E. Information Brokers for a Web-Based Geographic Information System.
InteroperatingGeographicInformation Systems. Edited by Goodchild, M. & Egenhofer, M. &
Fegeas, R. & Kottman, C. Kluwer Academic Publishers 1999. pp. 195-202.
[38] Fowler, Robert. Digital Orthophoto Concepts and Applications: A Primer for Effective Use.
GeoWorld, Vol. 12, No. 7, July 1999, pp. 42-46.
[39] Frank, Steven. The National Data Infrastructure: Designing Navigational Strategies. Journal
of the Urban and Regional Information Systems Association (URISA), Vol. 6, No. 1, Spring
1994, pp. 37-55.
[40] Fuller, Gary. A Vision for a Global Geospatial Information Network (GGIN): Creating, Maintaining and Using Globally Distributed Geographic Data, Information, Knowledge and Services. PhotogrammetricEngineering and Remote Sensing, Vol. 65, No. 5, May 1999, pp. 524-
535.
[41] Gaede, Volker. Spatial Internet Marketplaces from a Database Perspective. Interoperating
Geographic Information Systems. Edited by Goodchild, M. & Egenhofer, M. & Fegeas, R. &
Kottman, C. Kluwer Academic Publishers 1999, pp. 415-426.
[42] Garie, Henry. The National Spatial Data Council - A True Partnership for NSDI. GeoWorld,
Vol. 12, No. 11, November 1999, pp. 36-37.
[43] Gifford, Fred. Internet GIS Architectures- Which Side is Right For You? GeoWorld, Vol. 12,
No.5, May 1999, pp. 48-53.
[44] Global Geomatics Webpage. URL: http://www.globalgeo.com. 1999.
134
[45] Goodchild, M. Geographical Information Science. InternationalJournal of Geographical
Information Science, Vol. 6, No. 1, 1992, pp. 31-45.
[46] Goodchild, M. & Egenhofer, M. & Fegeas, R. Interoperating GISs: Report of the Specialist
Meeting Held Under the Auspices of the Varenius Project. December 5-6, 1997.
[47] Goodchild, Michael. Preface. Interoperating Geographic Information Systems. Edited by
Goodchild, M. & Egenhofer, M. & Fegeas, R. & Kottman, C. Kluwer Academic Publishers
1999.
[48] Gouin, D. & Morin, P. Solving the Geospatial Data Barrier. Geomatica, Vol. 51, No. 3, 1997,
pp. 278-287.
[49] Gould, M. & Ribalaygua, A. SVG: A New Way of Web-Enabled Graphics. GeoWorld, Vol.
12, No. 3, March 1999, pp.4 6 -4 8 .
[50] Gunther, 0. & Muller, R. From GISystems to GIServices: Spatial Computing on the Internet
Marketplace. Interoperating Geographic Information Systems. Edited by Goodchild, M. &
Egenhofer, M. & Fegeas, R. & Kottman, C. Kluwer Academic Publishers 1999, pp. 427-442.
[51 ] Harder, Christian. Serving Maps on the Internet: GeographicInformation on the World Wide
Web. Environmental Systems Research Institute 1998.
[52] Harvey, F. & Kuhn, W. & Pundt, H. & Bishr, Y & Riedemann, C. Semantic Interoperability:
A Central Issue for Sharing Geographic Information. Annals of Regional Science, Vol. 33,
1999, pp. 213-232.
[53] Hoch, Robert. A New Era Dawns for Federal Geospatial Programs. GeoWorld, Vol. 12, No. 7,
July 1999, pp.4 8 - 5 2 .
[54] Hughes, John. Satellite Imagery Providers Face A Marketing Challenge. GeoWorld, Vol. 12,
No. 11, November 1999, pp. 8.
[55] Hurson, A. & Bright, M., & Pakzad, S. Multidatabase Systems: An Advanced Solution for
Global Information Sharing. IEEE Computer Society Press 1994.
[56] Industry Outlook '99: GIS Melts Into IT. GeoWorld, Vol. 11, No. 12, December 1998, pp. 4049.
[57] Interoperabilityof GeographicInformation, Research Initiative of the University Consortium
for Geographic Information Science (UCGIS 1996). Available at http://www.ucgis.org.
[58] ISOITC 211 Geographic Information/Geomatics. http://www.statkart.no/isotc211/
[59] Issak, J. & Lewis, K. Open Systems Handbook. IEEE Standards Press 1994.
[60] Ismail, Ayman. A DistributedSystem Architecturefor Spatial Management to Support Engineering Modeling. Master Thesis Submitted to the Department of Urban Studies and Planning.
June 1999.
135
[61 ] Jensen, John. IntroductoryDigital Image Processing:A Remote Sensing Perspective. Prentice
Hall 1986.
[62] Johnston, Stuart. The future of Com: Microsoft's .Net Revealed. XML Magazine, Fall 2000.
[63] Kahin, B. & Keller, J. Coordinatingthe Internet. MIT Press 1997.
[64] Keller, S.F. Modeling and Sharing Geographic Data with INTERLIS. Computer and GeoSciences, Vol. 25, No. 1, 1999, pp. 49-59.
[65] Kmitta, John. High-wire(less) Acts. Business Geographics,February 2000, pp. 6-8.
[66] Kottman, Clifford. The OpenGIS Consortium and Progress Toward Interoperability in GIS.
InteroperatingGeographicInformation Systems. Edited by Goodchild, M. & Egenhofer, M. &
Fegeas, R. & Kottman, C. Kluwer Academic Publishers 1999.pp. 39-54.
[67] Kramer, B. & Papazoglou, M. & Schmidt, H. Information Systems Interoperability.Research
Studies Press 1998.
[68] Krol, Ed. The Whole Internet: User's Guide and Catalog.O'Reilly & Associates 1994.
[69] Landgraf, Gunther. Evolution of EO/GIS Interoperability Towards an Integrated Application
Infrastructure. InteroperatingGeographic Information Systems INTEROP'99. Proceedingsof
Second InternationalConference. Zurich, Switzerland. pp. 29-40.
[70] Levinsohn, Allan. Spelling Out the Spatial Database Soup: Database Vendors Expand GIS
Opportunities Enterprisewide. GeoWorld, Vol. 13, No. 13, March 2000, pp. 38-42.
[71] Libicki, Martin C. Standards: The Rough Road to the Common Byte. National Defense University 1995.
[72] Litwin, W. & Mark, L. & Roussopoulos, N. Interoperability of Multiple Autonomous Databases. ACM Computing Surveys, Vol. 22, No. 3, September 1990, pp. 267-293.
[73] McKee, Lance. Catch the Integrated Geoprocessing Trend. GeoWorld, Vol. 12, No.2, February 1999, pp. 3 4 .
[74] McKee, Lance. 1999 Geodata Forum Will Address Key Issues. GeoWorld, Vol. 12, No.4,
April 1999, pp. 3 0 .
[75] McKee, Lance. The Impact of Interoperable Geoprocessing. PhotogrammetricEngineering
and Remote Sensing, Vol. 65, No. 5, May 1999, pp. 564-566.
[76] National Research Council. DistributedGeolibrariesSpatial Information Resources. National
Academy Press 1999. Also Available at http://bob.nap.edu/html/geolibraries/
[77] Nebert, Douglas. Interoperable Spatial Data Catalogs. Photogrammetric Engineering and
Remote Sensing, Vol. 65, No. 5, May 1999, pp. 573-575.
136
[78] Open GIS Web Mapping Special Interest Group Public Web Pages. Available at http://
www.opengis.org/wwwmap/index.htm, 1999.
[79] Open GIS Abstract Specification: Catalog Services. Available at http://www.opengis.org/public/abstract/99-113.pdf,1999.
[80] Ozsu, M. & Valduriez, P. Principles of DistributedDatabaseSystems. Prentice Hall 1999.
[81] Pascoe, R. & Penny, J. Construction of Interfaces for the Exchange of Geographic Data. International Journal of GeographicalInformation Science, Vol. 4, No.2, 1990, pp. 147-156.
[82] Peabody, George. What's New about the ASP Model? PlanetiT,10/11/99. Available at http://
www.planetit.com/techcenters/docs/new-economy/expert/PIT19991007SO03 4
[83] Peng, Z. & Neber, D. An Internet-Based GIS Data Access System. Journalof the Urban and
Regional Information Systems Association (URISA), Vol. 9, No. 1, Spring 1997, pp. 20-30.
[84] Ritter, N. & Ruth, M. The GeoTIFF Data Interchange Standard for Raster Geographic
Images. InternationalJournalof Remote Sensing, Vol. 18, No.7, 1997, pp. 1637-1647.
[85] Schowengerdt, Robert. Techniquesfor Image Processing and Classificationin Remote Sensing. Academic Press Inc. 1983.
[86] Sheth, Amit. Changing Focus on Interoperability in Information Systems: From System, Syntax, Structure to Semantics. Interoperating GeographicInformation Systems. Edited by Goodchild, M. & Egenhofer, M. & Fegeas, R. & Kottman, C. Kluwer Academic Publishers 1999,
pp. 5-29.
[87] Simon, Errol. DistributedInformation Systems from Client/Server to DistributedMultimedia.
McGrawHill 1996.
[88] Singh, Harry. Heterogeneous Internetworking. Prentice Hall 1996.
[89] Snyder, John. Map Projections- A Working Manual. U.S. Government Printing Office 1987.
[90]
Stewart, John. IETF Structure and Internet Standards Process. Available at http://
www.ietf.cnri.reston.va.us/structure.html.
[91] Stikeman, Alexandra. Where in the World? A New Scheme Unites the Internet and Geography. Technology Review, January/February 2001, pp. 34.
[92] Strand, Eric. XML Provides Web-Based GIS a Path to Scalable Vector Graphics. GeoWorld,
Vol. 11, No.8, August 1998, pp. 28-30.
[93] Strand, Eric. Will GIS be the Next ERP Module? GeoWorld, Vol.12, No. 11, November 1999,
pp. 74.
[94] Tari, Zahir. Interoperability between Database Models. InteroperableDatabase Systems (DS5). Edited by Hsiao, D.K. & Neuhold, E.J. & Sacks-Davis, R. 1993, pp. 101-118.
137
[95] Thoen, Bill. The Fairy Tale of the Ugly Standard. Geo World, Vol. 11, No.8, August 1998, pp.
32-34.
[96] Tosta, N. & Domaratz, M. The U.S. National Spatial Data Infrastructure. Geographic Information Research, Taylor and Francis 1997. Edited by Craglia, M., Couclelis, H.
[97] United Nations Economic and Social Commission for Asia and the Pacific. GIS Standards
and Standardization:A Handbook. United Nations Publication 1998.
[98] Vckovski, Andrej. Interoperableand DistributedProcessing in GIS. Taylor & Francis 1998.
[99] Vckovski, Andrej. Interoperability and Spatial Information Theory. Interoperating Geographic InformationSystems. Edited by Goodchild, M. & Egenhofer, M. & Fegeas, R. & Kottman, C. Kluwer Academic Publishers 1999, pp. 31-37.
[100] Venners, B. Jini: New Technology for a Networked World. JavaWorld, June 1999.
[101] Voisard, A. & Schweppe, H. Abstraction and Decomposition in Interoperable GIS. International Journalof GeographicalInformation Science, Vol. 12, No. 4, June 1998, pp. 315-333.
[102] Wainewright, Phil. Interliant: ASP Fusion for the Enterprise. ASPNews Case Studies, September 1999. Available at http://www.aspnews.com/_Pubs/Interliant-study.pdf.
[103] Whiting, Rick. Software Morphs Into a Service. InformationWeek. October 11, 1999. Available at http://www.informationweek.com/756/svcs.htm.
[104] Wiederhold, Gio. Mediation to Deal with Heterogeneous Data Sources. InteroperatingGeographic Information Systems INTEROP'99. Proceedings of Second InternationalConference.
Zurich, Switzerland. pp. 1-16.
[105] Wilson, J.D. Is Field Computing the GIS Killer App? GeoWorld, Vol. 11, No.12, December
1998, pp. 40-49.
[106] Wilson, J.D. GIS/ERP Integration Opens New Markets. GeoWorld, Vol. 11, No. 6, June
1998, pp. 82.
[107] Wolberg, George. DigitalImage Warping. IEEE Computer Society Press 1990.
[108] Worboys, Michael. GIS A Computing Perspective. Taylor and Francis 1999.
[109] World Wide Web Consortium GraphicsWeb Page. Available at http://www.w3c.org/Graphics.
[110] Zedan, HSM (Editor). DistributedSystems. Butterworths 1990.
138