Scalable and Extensible Infrastructures for Distributing Interoperable Geographic Information Services on the Internet by NADINE S. ALAMEH B.E., Computer and Communication Engineering, American University of Beirut (1994) M.S., Civil and Environmental Engineering, Massachusetts Institute of Technology (1997) M.C.P., Urban Studies and Planning, Massachusetts Institute of Technology (1997) Submitted to the Department of Civil and Environmental Engineering in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer and Information Systems Engineering ENG at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY MA SSACHUSETTS INSTITUTE OF TECHNOLOGY February 2001 FEB 2 2 2001 @ Nadine Alameh, 2001. All Rights Reserved. The author hereby grants to MIT permission to reproduce and distribute publicly paper ana efe= tronic copies of this document in whole or in part, and to grants others the right to do so. Author ................... . ......., .. t ,... . .. ........................... Department of Civiand Environmental Engineering January 19, 2001 Certified by ........................................... 7 .,-a Ferreira Research Operations Professor of Urban Planning and Thesis Supervisor ii Certified b y ...................................... .... -.- I .,.................................................... S Professor Accepted by John Williams Civil and Environmental Engineering Thesis Reader I.".". .................. Oral Buyukozturk Chairman, Departmental Committee on Graduate Students LIBRARIES Scalable and Extensible Infrastructures for Distributing Interoperable Geographic Information Services on the Internet by Nadine S. Alameh Submitted to the Department of Civil and Environmental Engineering on January 19, 2001, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer and Information Systems Engineering Abstract The explosive growth in Internet-powered services has fueled the quest for finding new killer Internet-based applications. This quest has often led to applications based on Geographic Information Systems (GIS), especially in the emerging field of the Mobile Internet. Unfortunately, the traditional GIS model falls short of accommodating the requirements and needs of the Internet environment. A more flexible GIS model is required to support the growing need for sharing increasingly available yet distributed geographic data, and for facilitating the integration of GIS with other information systems. Such a model will be especially beneficial for scientific research and engineering modeling as well as state and federal government settings, where tightly coupled hierarchical systems are unlikely to have the desired breadth and flexibility. This next generation flexible GIS model is seen to deliver GIS functionalities as independently-provided, yet interoperable, services over the Internet. Such services can then be dynamically chained to construct customized applications. The goal of this thesis is to develop a framework for building a scalable and extensible infrastructure that can support and facilitate the dynamic chaining of distributed services. Towards that goal, the thesis evaluates and contrasts a set of alternative architectures. In doing so, it identifies the key elements and players, and focuses on issues pertaining to error handling, back-tracing of data and services in transactions, as well as service discovery and network management. A detailed analysis of a typical use case shows that a federated architecture is the most promising in terms of meeting the scalability, extensibility and flexibility requirements of the infrastructure. In this context, the thesis stresses the necessity of service and catalog interoperability, the need for GIS metadata standards which comply with general IT standards, and the usefulness of XML in defining extensible GIS data exchange standards. The thesis argues that the sustainability of a distributed infrastructure also depends on successful organizational partnerships, scalable schemes for network management, as well as technical enhancements of GIS services in terms of data streaming techniques and effective compression standards for GIS data on the Internet. Thesis Supervisor: Joseph Ferreira Title: Professor of Urban Planning and Operations Research Acknowledgments Thank God it's finally over! The Ph.D. process has been such an extremely demanding experience, sometimes more on the emotional level than the intellectual one. And truth be told, I could never have gone through it without the help and support of several people around me. I would like to thank Professor J. Ferreira, my research advisor for six years, for his support and guidance throughout this study and my stay at MIT. He has been a great inspiration. I have gained a lot from his expertise, his enthusiasm and his interest in how technologies can shape our lives. My sincere thanks also go to Professor K. Amaratunga for his patience and his time, and to Professor J. Williams for his help throughout my graduate studies. My most sincere thanks also go to Cynthia Stewart for her patience and support, especially during the last few months. I consider myself very lucky to have had John Evans as my office-mate for the last three years. I thank him for his help, sense of humor and most of all his friendship. Thanks for having the patience to listen to me nagging about this thesis on a daily basis! Many special thanks to the CRL staff for making me feel right at home, especially Tom Grayson who always knew how to cheer me up! And of course, the PSS alums for their support and friendship, especially Raj Singh, Matt Gentile, and Ayman Ismail. Throughout my years at MIT, I have made very special friends who have made my stay here very enjoyable! I thank them dearly, especially Fadi Karameh, Saad Mneimneh and Mazen Wehbeh from the lebanese gang, and Petros Komodoros, Salal Humair and Terry Vendlins from the 1.00 gang! I would also like to thank my Jazzercise buddies and instructors for making me look forward to that one energy-boosting hour every day! I can never thank enough my husband and best friend, Hisham Kassab, for his love and patience, his endless attempts at motivating me and for accommodating my weird mood swings! At some level, this experience has tremendously strengthened our relationship and I am thankful for that. I feel very lucky to have someone as nice, so caring and so smart by my side, now and in the future. Finally, I thank my family for the unconditional support they provided me! I especially thank my mother for her endless prayers and her daily support whether on the phone or through emails. My thanks to my sister, my "baby" and great friend Rola (aka Roro), for enriching my life in ways I could never describe. And for my brother Rani for his support and encouragement and for teaching me so much about patience, perseverance and hope. I dedicate this thesis to him. Table of Contents 1 Introduction ............................................................................................................................ 1.1 Overview ....................................................................................................................... 1.2 M otivation ................................................................................................................... * 1.2.1 Benefits of A D istributed GIS Infrastructure............................................. * 1.2.2 The Significance of Scalability and Interoperability .................................. * 1.2.3 The Sources of Com plexity ...................................................................... 1.3 Objectives and Contributions .................................................................................. 1.4 Research M ethodology........................................................................................... * 1.4.1 Looking at Existing Technologies and Efforts .......................................... * 1.4.2 Learning by D oing: A Prototyping Experim ent......................................... * 1.4.3 Identifying Basic Architectural Elements and Setups ............................... 1.5 Thesis Organization................................................................................................ 9 9 10 11 13 15 18 19 20 20 20 21 2 Background...........................................................................................................................22 2.1 2.2 2.3 2.4 2.5 The Evolution of GIS ............................................................................................. - 2.1.1 Legacy GIS System s .................................................................................. * 2.1.2 Influences of Emerging Technologies ...................................................... * 2.1.3 A vailability of Spatial D ata ...................................................................... - 2.1.4 Impact of the Internet................................................................................ * 2.1.5 Emerging Role of GIS in Today's Enterprises ........................................... - 2.1.6 M obile and W ireless Technologies........................................................... Interoperability and Standards ............................................................................... * 2.2.1 Interoperability........................................................................................... * 2.2.2 Standards.................................................................................................... Ongoing Standardization Efforts........................................................................... * 2.3.1 Early standardization efforts ...................................................................... * 2.3.2 Spatial Data Transfer Standard (SD TS).................................................... * 2.3.3 N ational Spatial D ata Infrastructure (N SD I) ............................................. * 2.3.4 Open GIS Consortium (OGC)...................................................................... * 2.3.5 W orld W ide W eb Consortium (W 3C) .......................................................... * 2.3.6 Other Efforts ............................................................................................. The Internet ................................................................................................................. * 2.4.1 Identifying Key Success Factors................................................................ * 2.4.2 Im plications................................................................................................ Application Service Providers (A SP).................................................................... * 2.5.1 Overview .................................................................................................... * 2.5.2 Issues............................................................................................................. * 2.5.3 Im plications................................................................................................ 3 Case Study: Raster Image Re-projection Service Prototype ...................................... Introduction ................................................................................................................. 3.1 3.2 Re-projection: Overview and M otivation ............................................................... * 3.2.1 O verview .................................................................................................... * 3.2.2 M otivation: Reasons for Picking Re-projection ........................................ 3.3 Re-projection Service: Design Process and Preliminary Interface ........................ * 3.3.1 Interface Design Process........................................................................... * 3.3.2 Im age Re-projection Interface .................................................................. 4 23 23 23 25 26 26 27 27 28 29 33 33 33 34 35 37 38 39 39 42 43 44 45 47 49 49 50 50 51 52 53 53 3.4 3.5 - 3.3.3 GetCapabilities Request........................................................................... 56 58 Re-projection Service: a Prototype Implementation ............................................... * 3.4.1 Implementation Options............................................................................ 59 . . 60 * 3.4 .2 P rototype .................................................................................................. * 3.4.3 Chaining Prototype with MITOrtho Server ............................................... 63 66 Synthesis of Observations and Findings ................................................................. 66 * 3.5.1 Standards and Interoperability Issues ........................................................ 68 * 3.5.2 Inherent GIS Design Issues....................................................................... 69 * 3.5.3 Distributed Infrastructure and Chaining Issues......................................... 71 * 3.5.4 Distinctive Characteristics of the GIS Problem......................................... 4 Architectures: Components, Chaining and Issues ......................................................... 4 .1 O verv iew ..................................................................................................................... 4 .2 A ppro ach ..................................................................................................................... * 4.2.1 Example Scenario and Assumptions......................................................... * 4.2.2 Focus of Analysis....................................................................................... Abstraction Level 1: Decentralized Architectures ................................................. 4.3 * 4.3.1 Geo-Processing Services as Basic Components ........................................ * 4.3.2 User-Coordinated Service Chaining ........................................................ * 4.3.3 Complexities of Nested Calls..................................................................... * 4.3.4 Issues and Implications ............................................................................. * 4.3.5 Aggregate Services .................................................................................... 4.4 Abstraction Level 2: Federated Architectures with Catalogs.................................. * 4.4.1 Catalogs for Service Discovery ................................................................. * 4.4.2 Service Discovery and Chaining................................................................ * 4.4.3 Issues and Implications ............................................................................. 4.5 Abstraction Level 3: Federated Architectures with Mediating Services................. * 4.5.1 Mediating (Smart) Services ...................................................................... * 4.5.2 Mediating Services and Service Chaining ............................................... * 4.5.3 Issues and Implications ............................................................................. 4 .6 S um mary ..................................................................................................................... 75 75 75 76 77 78 78 79 81 83 84 85 86 86 87 89 89 90 90 92 94 5 Synthesis and Conclusions.............................................................................................. 94 5.1 Summary of Research ............................................................................................. 96 5.2 Navigation Framework (Roadmap)......................................................................... 96 * 5.2.1 Description of Framework ......................................................................... 102 * 5.2.2 Applications of the Framework .................................................................. 5.3 Implications on Required Standards and Protocols for Future Research..................103 104 * 5.3.1 Data Format and Exchange Standards ........................................................ 106 * 5.3.2 Service Chaining and Data Passing ............................................................ 108 * 5.3.3 Message Passing and Dialog Structure ....................................................... 110 * 5.3.4 O ther Implications ...................................................................................... 111 .......................................................... and Markets 5.4 The Future of GIS Technologies 111 * 5.4.1 Dynamics of the Future GIS Marketplace .................................................. 113 * 5.4.2 Challenges and Opportunities ..................................................................... Appendix A Traditional GIS Model .................................................................................... 115 Appendix B Map and Capabilities Request Specifications................................................116 B .1 M ap Interface ............................................................................................................ 5 116 B.2 Capabilities Interface.................................................................................................117 A ppendix C Projection and Re-projection..........................................................................121 C.1 C.2 C.3 C.4 C.5 Projection Surfaces....................................................................................................121 Types of Projections..................................................................................................122 Interpolation/Resam pling M ethods ........................................................................... Prototype Capabilities Sum mary...............................................................................122 Re-projection Approxim ation ................................................................................... Bibliography..........................................................................................................................132 6 122 123 List of Figures Figure 1.1 A simplified view of the service-centered GIS infrastructure................................ 12 16 Figure 1.2 Setup option 1: Client coordinates among needed services.................................... Figure 1.3 Setup option 2: Services chain, transparently to the user...................................... 16 Figure 1.4 Service chaining and metadata tracking. .............................................................. 17 Figure 1.5 The three perspectives of the research methodology............................................. 19 Figure 2.1 Evolution of GIS in response to enabling technological advances........................ 24 Figure 2.2 Key drivers for unbundling and distributing GIS ................................................. 25 Figure 2.3 Simplicity of Internet standards and consensus for consistency. ......................... 42 44 Figure 2.4 Convergence of business and technology conditions. .......................................... Figure 3.1 Raster im agery re-projection. ................................................................................ 51 Figure 3.2 Original jpg image in the Mass State Plane reference system............................... 61 Figure 3.3 Image re-projected into lat/long using ArcInfo. ................................................... 61 Figure 3.4 Image re-projected to lat/lon using approximation method.................................. 63 64 Figure 3.5 Prototype re-projection service: Internal flowchart................................................ Figure 3.6 Chaining re-projection service with MITOrtho server: Interaction flowchart. ......... 65 Figure 3.7 Encoding geo-referenced imagery using XML. ................................................... 67 Figure 4.1 Illustration of services used in the example...........................................................77 Figure 4.2 User-coordinated service chaining in decentralized architectures......................... 80 Figure 4.3 Using nested calls for service chaining. ................................................................ 81 Figure 4.4 A ggregate services................................................................................................ 84 Figure 4.5 The role of catalogs in a federated setup. .............................................................. 87 Figure 4.6 Service chaining with mediating (smart) services.................................................. 90 Figure 5.1 The shift to distributed interoperable GIS components......................................... 95 Figure 5.2 Sample applications positioned with respect to client and data dimensions. .......... 102 Figure 5.3 Example 1: Using XML to minimize data transfer in federated architectures........ 107 Figure 5.4 Example 2: Using XML to minimize data transfer in federated architectures........ 107 Figure 5.5 Reducing redundancy in capabilities retrieval......................................................... 109 Figure 5.6 Potential value chain for the future GIS marketplace.............................................. 112 Figure C .1 Sample Projection Surfaces [89]............................................................................. 121 Figure C.2 A rectangle in Mass State Plane Coordinates. ........................................................ 124 Figure C.3 The reprojected rectangle in Lat/Lon...................................................................... 124 7 List of Tables Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table T able 1.1: Example criteria for querying services in catalogs................................................ 2.1: The background at a glance.................................................................................. 2.2 : The three abstraction levels of interoperability issues. ....................................... 2.3 : De Facto versus De Jure standards...................................................................... 2.4 : Tim ing of standards.............................................................................................. 2.5 : Levels of standardization....................................................................................... 2.6 : Organizations involved in Internet coordination and standardization................... 2.7 : Advantages of the ASP model.............................................................................. 3.1 : Preliminary raster re-projection service query parameters.................................... 3.2 : Re-projection service capabilities parameters...................................................... 3.3 : Sample re-projection times using ArcInfo. .......................................................... 3.4 : Distinctive characteristics of the GIS problem..................................................... 4.1 : Simplified service requests with input/output parameters..................................... 4.2 : Summary of architectures: components, service chaining and issues................... 5.1 : Framework Dimension 1: Application environment........................................... 5.2 : Framework Dimension 2: Data characteristics..................................................... 5.3 : Framework Dimension 3: Service characteristics. ............................................... 5.4 : Framework Dimension 4: Client characteristics. ................................................. 5.5 : Framework Dimension 5: Standards. ................................................................... B .1 : The M ap request. ................................................................................................. 8 17 22 29 30 31 31 41 44 57 58 60 72 80 93 98 99 100 101 101 116 Chapter 1 Introduction 1.1 Overview Geographic Information Systems (GIS) have been in existence since the late 1970's, primarily as stand-alone applications used to "capture, model, manipulate, retrieve, analyze, and present geographically referenced data" [108]. Traditionally, these systems have been independently developed by organizations outside their Information Systems (IS) departments in response to specific internal user needs. Unfortunately, this independent development resulted in GIS becoming fragmented and isolated from mainstream Information Systems, in addition to creating numerous incompatible data formats, structures and spatial conceptions. Today, the incompatibility problems are accentuated by the growing need for sharing increasingly available GIS data, and the proliferation of the Internet as an infrastructure for sharing and distributing this data. Furthermore, in today's more networked world, new requisites are emerging due to recent pushes for integrating GIS with other Information Systems, and for supporting mobile computing devices requirements. It is now evident that the traditional model for designing, delivering, and using GIS needs to be adjusted to accommodate the requirements of a more dynamic networked environment. This thesis is based on the premise that a new scalable and extensible infrastructure is needed to support a more flexible model for delivering and using GIS. The next generation GIS model is likely to deliver GIS functionalities over the Internet, as independently-developed, yet interoperable autonomous services. Such services are essentially processes that run on web servers, offering fundamental GIS functions via standardized access interfaces. Under this model, potential users 9 have the option of combining and chaining services in order to construct customized solutions for their problems, and to integrate them with their current systems. However, in an environment where both services and data are constantly added, removed and updated, dynamic chaining of services will be a complex process. In order to understand how to manage this complexity, the thesis introduces a set of coordination elements (such as catalogs and mediating agents), whose purpose is to facilitate, simplify and possibly optimize the dynamic chaining of services. One of the main goals of this work is to evaluate some available architectural choices and associated issues and trade-offs for supporting a flexible infrastructure that is both scalable and extensible in light of the complexities of service chaining. A significant challenge lies in identifying and prioritizing the issues that need to be addressed, both in the short run and the long run. In addressing this challenge, current and anticipated GIS requirements (such as high bandwidth, semantics and performance requirements) must be accommodated, with the understanding that the only constants are rapid change and innovation. The Internet and its technologies are at the heart of this research given the crucial role they play in the push for unbundling and distributing geographic services. At the center of today's dynamic environment, the Internet is the current defacto information infrastructure and standard for communication and networking [20], and is one of the key drivers for the interest in a distributed geo-processing infrastructure. Accordingly, in this thesis, the Internet is assumed to be the platform upon which the geo-processing infrastructure is built. With its tremendous growth over the last decade, its global reach, simplicity and ease of use, as well as its tested and evolving technologies, the Internet indeed provides a fertile environment for delivering and accessing the specialized geoprocessing services. It is important to clarify at this point, that by using the Internet to distribute and access the services, such services will not necessarily be public to all. In fact, Internet technologies are increasingly being used as default platforms in today's organizations for distributing applications and information, both within the enterprise (Intranets) and among their partners (Extranets). Therefore, organizations can provide Internet-based geo-processing services internally to their employees, and externally to their partners, without necessarily making them available on the global Internet 1.2 Motivation The motivation for this thesis stems from the interest in understanding how recent developments in the Information Technology field can enhance the design, delivery and use of GIS technology. In 10 the last decade, the IT field has witnessed the exponential growth of the Internet, a growing popularity of handheld and mobile device computing, countless implementations for Enterprise Resource Planning systems, and the introduction of new concepts such as Intranets, Extranets, and Application Service Providers. These emerging technologies are having a direct impact on today's business environment and are changing users' expectations and requirements for the technology. In light of the current IT developments, the traditional model for delivering and using GIS falls short of accommodating the new users' requirements. Unbundling GIS and distributing their functionalities as interoperable services on a network (such as the Internet) promise to lead to a better service model for current and potential GIS users. In the next sections, we show how the distributed model can indeed meet the new requirements, and how it is currently materializing with a push from various organizations and experts in the GIS field. 1.2.1 Benefits of A Distributed GIS Infrastructure Anyone who has used GIS in the last two decades is probably familiar with what is referred to in this thesis as the traditional model of using GIS (See Appendix A). This model requires the users to first get trained on a specific GIS package (such as ArcInfo) before using its spatial analysis and visualization tools in their projects. Moreover, the analysis stage of a project is typically preceded by a lengthy data preparation phase. This phase often includes finding and buying CDROMS for data and imagery for areas covering or overlapping with the project's area of interest, converting the data into formats usable by a GIS package of choice, and extracting, from the data collected, a subset that is directly relevant to the particular project. Only after this lengthy data assembly and extraction process is the user ready for performing the analysis for the project. As a result, it is estimated that 60% to 85% of the cost of a typical GIS implementation project is due to the data conversion and integration step [48]. Furthermore, for most purposes, users only use a fraction of the functionalities offered by their GIS package [50]. Therefore, in most cases, the costly packages are under-utilized. Nowadays, the growing amount of data available in a variety of incompatible formats, and the broadening base of users (many interested in GIS applications in new fields) imply that a more flexible model is needed for delivering on-demand easy-to-use subsets of GIS data and functionalities. The new model (Figure 1.1) is based on the concept of unbundling functionalities in current stand-alone GIS systems into interoperable autonomous components/services. The unbundled functionalities can be used individually (or in sets) to perform specific tasks with little training. Such a model also facilitates the integration of these services with other Information Systems in 11 organizations, a trend that is on the rise given the recent wave of Enterprise Resource Planning implementations and the increasing recognition of the utility of spatial data in today's businesses [70]. Mobile Devices: Retrieve image centered at current location Browsers: Retrieve several overlayed images as one Information Systems: Perform business analysis based on customers' geographical distribution Standardized Interfaces Bundled Service Overlay Service Imagery Service Vector Data Service Imagery Service Address Match Service Figure 1.1 A simplified view of the service-centered GIS infrastructure. Today's demand for individual GIS functionality is also fueled by a growing market of handheld and mobile computing devices, that require a subset of functionalities at a time, and are limited in their processing power, memory and bandwidth. Nowadays, the availability of inexpensive hardware, the standardized Internet communication protocols, and the ease of connecting GPS devices to handheld devices, are making these devices more accessible to several parties. Indeed, many companies in the transportation, telecommunications, and agriculture fields view the use of handheld devices and location-based services as necessary for gaining a competitive edge and increasing productivity [105]. A model that allows handheld devices to have access to GIS services via standardized interfaces will enable these devices to broaden the range of applications they can support in the field. This could be achieved by having the device connect to a specialized server via a simple protocol, which, depending on the query, conveniently returns only the data that is needed at a particular time for a particular location. In summary, an infrastructure that supports a network of distributed interoperable and autonomous GIS services promises to address the emerging requirements of today's users. It saves the growing number of GIS users from the burden of buying, setting up, learning and maintaining a 12 full GIS package. It also enables organizations to better share and make use of available data as they can access and chain the available services to transform or process any data to fit their interests [76]. Finally such a service-oriented infrastructure allows clients (including handheld and mobile devices) to focus on visualization and interaction with data while leaving the computationally intensive tasks and data management to the services. The proposed infrastructure hence provides users with the flexibility they currently seek as a result of today's extensive business changes and rapid technology advances. It also offers them a much needed breadth of independently-provided services that they can freely mix and match to create better customized applications, with minimal in-house development. 1.2.2 The Significance of Scalability and Interoperability Given the description of the service-centered infrastructure presented thus far, this thesis focuses on efficient and scalable mechanisms for facilitating the chaining and interactions of the dynamic services. We argue that there are two reasons why this focus is timely and particularly relevant for the future success of this infrastructure. First, given the recent advances in networking, distributed computing and databases, it is now technologically feasible to implement the various autonomous services and make them available to users via standardized interfaces over the Internet. Indeed, the construction of the infrastructure is made possible by the evolution of infrastructures and middleware technologies that support distributed and client/server computing (such as RMI, CORBA, DCOM, ODBC, JDBC). In addition, the infrastructure can employ the best of today's Internet technologies, including TCP/IP, HTML, XML, Java as well as standard web practices for security/authentication. The availability of these technologies today helps shift the focus from the task of building and providing the services from scratch, to addressing higher-level infrastructure issues, such as scalability, interoperability and extensibility in the distributed environment. Indeed, the infrastructure will be usable and sustainable in the long run, only if the distributed setup can scale with the number of users and services. The second reason for focusing on efficiency and scalability is the fact that the service-centered infrastructure is a common "vision" in the GIS community. Indeed, this new model for delivering GIS is a frequent discussion topic in today's GIS literature (refer to References) and is being advocated by the Web Mapping group of the Open GIS Consortium [78]. This is a result of an increasing interest in web mapping, a growing acceptance of the Application Service Provider (ASP) model and the availability of technologies that support such an infrastructure. Several working examples do currently exist and are successfully being used. In this section, we introduce two such services. 13 - The MITOrtho Server Technology The MITOrtho Server (http://ortho.mit.edu) technology is a product of several years of research at the Computer Resource Laboratory at MIT. Through an easy-to-use interface, users can extract the orthophotosi at the resolution and viewport they need. The interface (which is currently compliant with the Web Mapping Testbed specification) can be used in a variety of applications (including web browsers and GIS packages) to produce customized snippets of orthophotos from multiple Gigabytes of images archived on the server. Other compliant map servers can be found at http://www.opengis.org. * The Etak EZ-Locate Service The Etak EZ-Locate Service (http://www.etak.com) is an online service that provides real time access to the Etak Geocoding service over the Internet. The technology consists of an address matching engine, which returns, using a currently proprietary interface, the most accurate match(es) for street address(es) requested by the user or the application. - The Microsoft TerraService The Microsoft TerraService (http://terraservermicrosoft.com) is a programmable interface to TerraServer's online database of high resolution USGS aerial imagery and scanned USGC topographical maps. The service uses the Microsoft .Net framework to provide users with methods for performing standard queries against the TerraServer databases. - The MapQuest LinkFree Service The MapQuest LinkFree service (http://www.mapquest.com) allows users to create maps for any address they choose. In the form of a URL, users can specify parameters such as the address to be mapped and the desired dimensions, style and zoom level of the map. These examples are introduced here to show that the construction of the cited infrastructure has actually begun, and with the emergence of more of these services by independent parties, its growth is becoming evident. For this reason, the thesis emphasizes the importance of addressing scalability and interoperability issues at this stage, before more of these independently-provided services become available, making it harder to find common grounds for standardization and acceptance. Without standardization, the sustainability of the infrastructure in the long run will be jeopardized. In the next section, an example scenario is presented. Although seemingly simple, the scenario shows that the complexity of the options explodes a lot sooner than expected. 1. Orthoimages are raster images that have been analytically rectified for tilt and relief, so that every location on the map is in its true geographic position [38]. They are often used as backdrops to vector data to validate accuracy, and as visualization enhancers that allow users to associate their data with the natural geography of areas of interest, as well as for contour generation and digitization [8]. They are used for a wide variety of urban and residential infrastructure, transportation, agriculture, forestry, environmental monitoring to list a few [54]. 14 1.2.3 The Sources of Complexity Controlling the rapidly growing complexity of dynamic chaining and coordination of services in the distributed infrastructure, without compromising performance and scalability, is the main motivation of this thesis. Chaining occurs when a request sent by a client cannot be provided by a single service, but rather by combining or pipelining results from several complementary services. The thesis examines the alternatives for efficient and practical options for the dialogs that occur among the individual services, and between the services and the calling client. To give the reader a feel for this complexity, consider the following example which uses the MITOrtho and Etak servers mentioned in Section 1.2.2. In this simple example, assume that the client application's objective is to retrieve an orthoimage that is centered around a variable address supplied by the user. In order to achieve that objective, the client would, either directly or indirectly, use the Etak service for matching the address with a geographic location, and subsequently, the MITOrtho service to get the corresponding image. This seemingly simple example is complicated by several issues. - Dialog Issues: How should the dialog between the three parties in this example evolve? Should the client coordinate between the requests to the two services by first sending the address to the Etak service, picking the desired location from a list of possible results returned by the service, and then sending the location to the MITOrtho service (see Figure 1.2)? Or should the two services somehow communicate, by having the client send the address directly to the MITOrtho service, which, transparently to the user, would use the Etak service to match the address with a geographic location that can then be used to get the appropriate image (see Figure 1.3)? Or better yet, should there be a third service responsible for coordinating between these two services, and minimizing the amount of data transferred between services? - Coordinate Systems Issues: What if the location returned by the Etak service is in a coordinate system that is not supported by the MITOrtho service? Should the latter service return the next best thing to the client, perhaps a locally reprojected image? Or should the MITOrtho service automatically make use of another re-projection service, capable of transforming the image from the local coordinate system to the one requested by the user? If so, should the user know about the image re-projection (in which case, its quality might be compromised)? Or should the user have the option to specify a re-projection service to be used as well as various re-projection parameters? 15 1. Client sends address to address matching service Etak Address Watching Service 2. Address matching service returns geographic coordinates corresponding to received address 3. Client sends the coordinates along with other parameters such as image size, resolution and M format to MITOrtho service Client MITOrtho Service 4. MITOrtho service processes request and returns image centered at desired location Figure 1.2 Setup option 1: Client coordinates among needed services. Etak Address Matching Service 3. Etak service sends geographic coordinates corresponding to address 2. MITOrtho service, transparently to the client, sends the address to the Etak service to get the matching coordinates 1. Client sends the request for an image centered at th~e gi'ven address Client MITOrtho Service 4. MITOrtho service delivers image centered at address Figure 1.3 Setup option 2: Services chain, transparently to the user. * Metadata Tracking Issues: In the case where the MITOrtho service employs a re-projection service, or in the case where the MITOrtho service itself, assembles the images from various other services, how should the chain of requests leading to the final image be relayed back to the client (Figure 1.4)? How would the user know where the data came from, and how/where it was transformed en route? How could the user be given control over the path of the data, if the user desires so? - Caching, Semantics and Authentication Issues: What if the MITOrtho service updates its images on a daily basis? What happens when the user makes the same request tomorrow as the one made today? Should the service send the latest data by default, or should there be a way, through some dialog between the service and the client, for the service to return to the client the same image that was requested earlier? Where should the state of the client be maintained? How to incorporate into the design cases when the same service might return a different result (perhaps the same image with different resolutions, perhaps the same image but at a different price) depending 16 on the requesting client, as authenticated? MITOrtho service assembles orthoimage from several imagery services Reprojection service reprojects the image from one coordinate system to another one Imagery Poider Servic Provider Service Imagery rviderSi Overlay service overlays the input image and the vector data and sends the overlay to the client MITOrtho Servic Imagery Provider Service Reprojection Vector Data Provider Service Vector data provider service returns a certain layer at the extent specified Figure 1.4 Service chaining and metadata tracking. Service Management Issues: Suppose there are several ITOrtho-like services available on the infrastructure, what is the best method for finding services needed by the client, that meet certain criteria (performance, price, quality, location, etc.) (see Table 1.1)? What type of coordinating elements can be introduced to the infrastructure to facilitate this task without compromising scalability or performance? What is the role of catalogs and how do they integrate with the rest of the services? Criteria Example Best Quality or Closest Matching Data Get an image that has not undergone transformations that might have degraded its quality. Fastest Response Use an approximation for projecting the image to the desired coordinate system in order to minimize processing time. Most Reliable Sources Use data stored by a government entity or a certified organization. Least Expensive Transactions Use free or demo services instead of subscribing to fully functional ones. Least number of Hops/Chains Use the minimum number of links to get the desired data. Table 1.1 : Example criteria for querying services in catalogs. 17 This subset of issues illustrates the complexity of dialogs among services and clients even in simple cases. Resolving these issues calls for a careful choice of design options for the infrastructure, with scalability (in terms of number of users, services and interactions), extensibility (in terms of accommodating newer technologies and services), and interoperability in mind. This thesis addresses a subset of these issues as explained in the next section. 1.3 Objectives and Contributions In this thesis, we analyze some of the issues pertaining to the design of the service-centered infrastructure described thus far, with an emphasis on the scalability and sustainability of the infrastructure, and the amount of technology efforts required to support it, and how they fit into a broader view of interoperability. In particular, the thesis focuses on the aspects of the infrastructure related to the dynamic chaining of services, the traceability of metadata and services in transactions, and the introduction of middleware type services for mediation and management. Given that the services can be combined in ways that are not pre-defined by their providers, and that only minimal assumptions can be made about the clients, the emphasis is on interoperable interfaces of the services, and efficient dialog structures among them. In the process of sorting out through technological choices and trade-offs, efficiency and performance are the primary constraints, especially in terms of minimizing the bandwidth usage, maintaining the thinness of the clients and the flexibility of the services, and back-tracing of metadata and intermediate data transformations. The objective of meeting the scalability, extensibility and efficiency constraints in today's rapidly evolving technology environment is the key challenge of the thesis. Although today's technologies provide the basis for building the service-centered geoprocessing infrastructure, their rate of innovation and change makes it more challenging to find a practical and long lasting framework for building a sustainable infrastructure. Given these challenges and the uncertainty of the future, it is important to note that this thesis does not attempt to provide a definite answer or recommend a specific solution. The contribution of this work comes from taking a rather broader perspective, and helping identify and prioritize the issues that are believed to be the most likely critical to address, for the success of the geoprocessing infrastructure in this environment. As for technologies needed to support the infrastructure, the objective of the thesis is not to develop new technologies for that purpose, but rather to harness the potential of current and emerging mainstream IT technologies and trying to apply them in the con- 18 text of a specialized GIS infrastructure. Indeed, one of the goals of the thesis is to isolate the issues and constraints that are specific to a GIS infrastructure and the nature of GIS processing. This research can be considered as complementing the current interoperability efforts within the OpenGIS Web Mapping group [78], whose current focus is on determining standard interface specifications for web mapping. With the group running via a consensus-based process, and consisting of organizations struggling with their own legacy issues, it is not surprising that, in spite of the group's rapid progress, the issues related to the chaining complexities of the infrastructure have only recently begun to surface. In this light, this thesis is timely since it can build on the efforts of the Web Mapping group while moving beyond the simple interfaces to study the dynamic interactions among the services. The findings will hence assist in anticipating the next round of issues related to web mapping as defined by the Web Mapping group. The next section describes the methodology followed in identifying these issues. 1.4 Research Methodology In order to identify and present, in a structured manner, the fundamental issues addressed by this thesis, the research is approached from three different perspectives, as depicted in Figure 1.5. Each perspective contributes to the final recommendations by either complementing or reinforcing findings from the other two perspectives. This approach allows us to easily highlight crucial issues and focus the discussion around them. 1. Looking at existing technologies and efforts Framework of Issues 3. Identifying basic elements and setups and Choices 2. Learning by doing: A prototyping experiment Figure 1.5 The three perspectives of the research methodology. We start by looking at the dynamics of existing technologies and efforts in order to draw basic lessons from similar settings, and consequently assemble a preliminary set of essential issues and choices. In light of these issues, we then move to implement a prototype experiment which sheds more light on the complexities of interoperable service design as well as the feasibility of service chaining in practice. Finally, using a simple yet non-trivial example derived from the prototyping 19 experiment, we address additional issues in the context of investigating alternative architectural setups. 1.4.1 Looking at Existing Technologies and Efforts The first stage of this thesis consists of assembling a preliminary set of issues, building the vocabulary to use, and positioning this research with respect to other efforts in the GIS field. Towards that goal, we analyze the forces driving the demand for a distributed geo-processing infrastructure. This step is followed by a brief study of the interoperability and standards literature, to give an overview of the general issues. These issues are complemented by lessons learned from looking at two contemporary examples of distributed service-oriented infrastructures: the Internet and the growing field of Application Service Providers (ASP) and the Internet. 1.4.2 Learning by Doing: A Prototyping Experiment Prototyping is used in this thesis to get a deeper understanding of the constraints and requirements of designing interoperable services and chaining them. In this case, the prototyping effort centers around the design and development of a geo-referenced raster image re-projection service. The reasons for selecting this task for the prototyping experiment are explained later. This prototyping experience helped us identify some issues that, while not necessarily unanticipated, were hard to conceptualize without trying them in an operational setting. 1.4.3 Identifying Basic Architectural Elements and Setups By this stage, we are equipped with a deep understanding of the problem and a rich set of preliminary issues. We use this knowledge to focus the discussion through a carefully selected example, which despite its simplicity, provides the right level of complexity for studying service chaining. The example is used to identify candidate architectures and study, in greater detail, service chaining and its repercussions on the use and scalability of each architecture. At this stage, the thesis will be often borrowing and adapting choices and configurations from other IT areas such as distributed databases, search engine technologies, JINI technology, W3C protocols, etc. By incorporating such emerging mainstream technologies and standards into our analysis, we hope to arrive at solutions that can be easily integrated with the rest of IT, hence lessening the isolation of GIS. 20 1.5 Thesis Organization The organization of this document follows directly from the research methodology described above. Chapter 2 presents the results of our attempts to learn from relevant technologies and efforts. It includes the background research that we feel is needed to support the development of this thesis. In Chapter 3, we describe our experience with building a prototype of a re-projection service for geo-referenced raster imagery. We use this prototyping experience in Chapter 4 to refine a framework for identifying the requirements for a scalable distributed infrastructure, for describing the operations of some of its basic elements, and exposing the first set of general issues that arise. This framework is applied within the context of a carefully selected example to explore architectural alternatives for the distributed geo-processing infrastructure. Finally, Chapter 5 presents a synthesis of the issues explored in the thesis, and their relationship with other ongoing efforts. We also conclude with a speculation on the future of the GIS field baring a successful distributed infrastructure, and present recommendations for further studies. 21 Chapter 2 Background The purpose of this chapter is to assemble a preliminary set of issues relevant to our discussion of a distributed geo-processing infrastructure. The chapter outlines our background research on related efforts and technological trends in the GIS and IT fields. It also positions this work and its contribution relative to those efforts presented. Table 2.1 provides an overview of the topics covered in this chapter and their relevance to our discussion. Topic Reason for Inclusion 2.1 The Evolution of GIS Needed to understand the driving forces behind the need for a distributed geo-processing infrastructure, and hence the major issues, requirements and constraints pertaining to building it. 2.2 Interoperability and Standards Identified as being key to the success of a scalable distributed infrastructure. 2.3 Ongoing Standardization Efforts Needed to relate the thesis content with ongoing formal efforts in the area of standardizing and distributing GIS. 2.4 The Internet Recognized as a successful case of a distributed system environment. Discussion focuses on the success factors and the possibility of applying them to the GIS case. 2.5 Application Service Providers (ASP) Introduced as a new model for delivering IT solutions to organizations. Discussion focuses on the trends and issues facing ASP and their similarities with those explored in the thesis. Table 2.1 : The background at a glance. 22 2.1 The Evolution of GIS In preparation for assembling the principal issues pertaining to the distributed geo-processing infrastructure, we develop an understanding of the key factors and technologies that have influenced and enabled its evolution. In this section, we briefly present this understanding of the stages along with some examples of key factors and technologies 2.1.1 Legacy GIS Systems Originally, Geographic Information Systems were developed independently by software vendors, who tailored their applications for their specific user needs, using locally created terminologies and approaches [47]. Constrained by the incapability of regular databases to accommodate geodata requirements, GIS vendors typically resorted to creating their own data structures that operated within proprietary file management systems [70]. On one hand, this separation of geodata from traditional databases led to the isolation of GIS technology. On the other hand, the complexity of these systems has confined the number of users to a few GIS professionals. With GIS data tightly integrated into the systems used to create it, and gathered according to different resolutions and scales [70], it became more difficult to share and re-use this data across departments and disciplines. This sharing and re-use problem is particularly hindering given geodata's potential for re-usability for independent needs and applications, its added value in spatial analysis, and its associated high reproduction costs [35]. However, with the continuous emergence of new technologies and a growing investment in interoperability, the GIS technology has gradually transformed, as discussed next. 2.1.2 Influences of Emerging Technologies The rapid developments in Information Technologies over the last two decades have accelerated the expansion of GIS in terms of both user and application bases (see Figure 2.1). For example, the declining cost of hardware, software and data over the years triggered the introduction of desktop mapping packages (such as ArcView and MapInfo). At the same time, GIS vendors attempted to increase their market penetration by focusing on simplifying the process of using a GIS. Moreover, the growth of the Internet and the accompanying advances in communications, middleware and component-based technologies led to a boom in web mapping and a new breed of GIS products (such as MapObjects) providing embeddable GIS components [76]. 23 - Wireless and Mobile computing - XML & ERPs - Internet (html, Java, etc.) - Client/Server - Middleware (CORBA, etc.) - Lower costs of PC - Faster Processing Stand-alone GIS systems (workstations) Desktop Mapping Databases with Unbundled GIS Spatial Extensions, SDE into Autonomous Services Embeddable GIS Components Data tightly integrated with GIS system Distributed and Web Mapping Interoperable GIS services Expanding GIS user and application bases Figure 2.1 Evolution of GIS in response to enabling technological advances. Similarly, the move towards client/server applications and the growth of data warehousing led to the surfacing of products such as the Spatial Database Engine, as well as the emergence of spatially enabled databases such as Oracle Spatial, Informix Universal Server (Datablade) and Sybase Adaptive Server Enterprise [70]. This trend of technology influences currently continues with what we consider to be four key drivers (portrayed in Figure 2.2) to unbundle and distribute GIS, namely * The - The - The * The growing availability of spatial data, as discussed in Section 2.1.3 proliferation of the Internet, as discussed in Section 2.1.4 new role of GIS in today's enterprises, as discussed in Section 2.1.5 advances in mobile and wireless technologies, as discussed in Section 2.1.6. 24 Ii Internet * Web mapping " Push for interoperability Availability of Spatial Data 9 Satellite imagery Variety of suppliers and distributors * Push towards an infrastructure of distributed autonomous interoperable GIS services Mobile & Wireless Techs Role of GIS in Enterprises * Location-based wireless services * Services for thin clients with bandwidth constraints Integration of databases Push for components J Figure 2.2 Key drivers for unbundling and distributing GIS. 2.1.3 Availability of Spatial Data In the last decade, the rate of GIS data collection (especially of the raster type) has significantly increased, due to advances in technologies such as high resolution satellite imaging systems and GPS. Indeed, many firms (both national and international) are entering the high-resolution satellite imagery business. For instance, in 1999 and 2000, three US commercial companies (Space Imaging, OrbImage, and Earth Watch) have been scheduled to launch satellite imagery systems with one-meter panchromatic and four-meter multi-spectral resolutions [40]. More data is also expected to be soon available from NASA, which, by virtue of the recent communication satellite competition and privatization act, will be allowed to sell some of its images and purchase from other trusted commercial sources [54]. With the expected abundance of imagery data, many companies are eager to create imagery distribution channels, many using the Internet. MapFactory (http://www.mapfactory.com), for example, is in the process of creating the largest geo-referenced digital imagery archive on the web. Microsoft, on the other hand, has established the TerraServer (http://terraser- ver.microsoft.com), which contains USGS photographs of more than 30% of the US as well as satellite imagery of other parts of the globe. As more non-traditional customers recognize the potential uses of geo-imagery in their businesses and as more data providers become available, the need for a scalable infrastructure of geoservices becomes increasingly pressing. Given the collective size of this data and the diversity of its potential uses, having central repositories of this data accessible within such an infrastructure offers many benefits to users and developers of GIS applications. On one hand, the distributed 25 solution spares users and developers the burden of maintaining the images, and on the other hand it frees them to focus their efforts on developing tools to effectively use and integrate this data. It is evident from the discussion so far that the Internet has played an important role in the race for distributing the collected imagery data as well as other GIS data, as discussed next. 2.1.4 Impact of the Internet The Internet has had a tremendous impact on information technologies and businesses. In the case of GIS, this relatively accessible and reliable communication infrastructure has accelerated the interest in sharing spatial data [25]. Indeed, the Internet can be credited with pushing GIS to the forefront by providing a platform for a growing number of web mapping sites and applications. Despite the increasing popularity of using the Internet to distribute and browse spatial data, it remains difficult to integrate data returned by various web mapping sites. For this to happen, data exchange and application interface standards are needed to access, view and integrate the available data in a consistent fashion [83]. In this sense, the Internet can also be seen as a major driver for the interest in GIS interoperability and standards. This interest is shared by GIS and database vendors who also face difficulties of integrating GIS data with other databases and systems, as discussed in the next section. 2.1.5 Emerging Role of GIS in Today's Enterprises Over the last decade, enterprises have discovered that greater knowledge and efficiencies can be achieved by integrating systems and databases [36]. As competitive pressures for growth intensify and accordingly Enterprise Resource Planning (ERP) systems gain momentum in today's organizations, several departments (like marketing, finance and customer service departments) are seeking new ways to enhance their operations through geospatial analysis. Spatial data is increasingly recognized as a corporate enterprise asset, which can provide the basis for more effective activity planning and development for the enterprise [97]. With about 80% of all data in the world's databases containing a spatial element or reference (such as zip codes, customer addresses, warehouse locations, etc.) [18], there has been increasing gravitation towards integrating GIS databases into the organizations' ERP systems. Thus, both organizations and ERP systems providers are nowadays looking for ways to incorporate, into their systems, flexible mapping components that can be customized with little difficulty or training. This interest in unbundled GIS components in turn strengthens the need for a service-centered model of GIS. 26 2.1.6 Mobile and Wireless Technologies A carefully designed distributed service-centered model can also play a critical role in the emerging mobile services industry. Advances in microprocessor-embedded GPS circuits, handheld computing and wireless networking have led to a new generation of PDAs (such as Palm Pilots, Windows CE, and Visor devices) that can incorporate the functions of a GPS receiver [75], and hence can provide a variety of location-based services to the customer. The interest in these services is high because their revenues are projected to rise from less than 30 million dollars in 1999 to 3.9 billion dollars in 2004, according to a study performed by the Strategis Group, a Washington DC consulting firm that specializes in wireless markets [64]. The dependence of these locationbased services on GIS presents several practical challenges in terms of bandwidth, latency and online connectivity. As argued in Section 1.2.1, a distributed geo-processing infrastructure can greatly serve this market segment by facilitating access and integration of data sources and services. In the context where the infrastructure is to accommodate the aforementioned GIS, ERP and wireless markets and their diverse users and constraints, while still meeting scalability requirements, interoperability of data and services is critical. Interoperability can be achieved in many ways and involves its own set of interesting challenges and trade-offs, as covered in the next section. 2.2 Interoperability and Standards Interoperability and standards play an important role in the information era. From bus structures to qwerty keyboards, IT standards are viewed as both drivers and reflections of the growth and maturity of the IT industry [19]. In the case of the distributed GIS infrastructure, interoperability and standards are essential considering that the success of the infrastructure is dependent on the ease with which users can access, interchange and freely mix-and-match independently-provided services. This section provides an overview of interoperability, standards and general standardization processes. It is intended to familiarize us with the issues that we are likely to face when discussing alternative architectures for a distributed geo-processing infrastructure. 27 2.2.1 Interoperability The need for interoperability in IT comes as a result of the existence of multiple heterogeneous systems within organizations, the desire to transparently share information and processes among these systems, and the need to combine autonomous components for the provision of larger applications [59]. Today, interoperability is widely recognized as a new paradigm for linking these heterogeneous systems and facilitating a more efficient use of the computing resources within and among organizations. The next sections present common definitions of interoperability, its advantages and some commonly recognized issues. Interoperability: Definition Given our extensive use of the term "interoperability" in this thesis, we start with a definition of the word. The IT literature shows that there are many coexisting and complementary definitions of interoperability. To begin with, interoperability is a key characteristic of an open system, which is defined by the IETF (Internet Engineering Task Force) [90] as a system that implements sufficient open specificationsfor interfaces, services, and supporting formats to enable properly engineered applicationssoftware to be ported across a wide range of systems with minimal changes, to interoperatewith other applicationson local and remote systems, and to interactwith users in a style thatfacilitates userportability. In other words, interoperability is a property of multi-vendor, multi-platform systems (or sub-systems) that allows them to interact with each other through the interchange of data and functions [67]. Litwin [72] stresses the autonomy of these systems with the following definition Interoperability refers to a bottom-up integration of pre-existing systems and applications that were not intended to be integrated but are systematically combined to addressproblems that require such an integration. Interoperability is hence a property of vendor-independent systems or components that enables them to transparently access, interchange, and integrate data and services among each other. Interoperability: Advantages According to the literature, interoperability provides technology users with 1. the freedom to mix and match components of information systems without compromising overall success [66], 2. the ability to exchange data freely between systems [47], and the ability to integrate information [99], 3. the ability to request and receive services between inter-operatingsystems and use each other'sfunctionalities [36]. By facilitating information access, integration and sharing in addition to inducing competition among providers [71], interoperability leads to broadened product acquisition opportunities [88], 28 reduced project costs, faster project life-cycles and flatter learning curves for new systems [47]. It also provides a more flexible computing environment for developers, as it facilitates application development, management and maintenance and frees developers to focus on adding value in their areas of expertise [59]. In the specific case of GIS, interoperability advantages also include overcoming tedious batch conversion tasks, import/export obstacles, and distributed resource access barriers imposed by the heterogeneous GIS processing environments and data [18]. Interoperability: Issues The interoperability literature suggests that issues related to interoperability can be studied at different levels of abstraction [46], as shown in Table 2.2. Level of Abstraction Issues Technical Format compatibilities, removal of detail of implementation, development of interfaces and standards. Semantic Domain-specific definitions and sharing of meaning. This layer is more problematic than the technical one. Institutional Influence of organizational forces on the success of interoperability solutions [34]: willingness of organizations to cooperate, as well as behavioral, economic and legal factors that affect the participation of parties. Table 2.2 : The three abstraction levels of interoperability issues. Depending on the problem at hand, the prioritization of these issues highly determines the approach followed towards achieving interoperability in that setting. As each problem is unique in its combination of requirements and constraints, it follows that there is no unique "right" approach to achieving universal interoperability. All approaches however share the challenge of satisfying current requirements while simultaneously being able to easily adapt to evolving user needs and technological changes [56], as emphasized in the next section about standards. 2.2.2 Standards Standards represent "the deliberate acceptance by an organization or a group of people who have a common interest, of a quantifiable metric for comparison, that directly or indirectly, influences the behavior and activities of a group by permitting and (possibly) encouraging some sort of interchange" [19]. Standards exist to enable interoperability, portability and data exchange [71]. This section provides an overview of the basic characteristics that make a "good standard" highlighting the available choices and their trade-offs. 29 Standards: Basic Characteristics We analyzed issues pertaining to the characteristics of good standards as portrayed in the literature along five dimensions: 1. 2. 3. 4. 5. The The The The The standardization process timing of standards standardization level scope and extensibility of standards acceptance of standards 1. The standardization process: According to the literature, standards are either developed by a recognized standards-developing organization (de jure standards), or as a solution from one provider that has captured a significant share of the market (defacto standards). Table 2.3 provides an overview of these two approaches. De Facto Standard De Jure Standard Definition Product or system from a provider that has captured a large share of the market, and that other providers tend to emulate, copy or use in order to obtain market share [59]. In some cases, the standard reflects the technical approach of the dominant player in the market (e.g., Microsoft) [20]. Solution created by a formally recognized vendor-independent standards-developing organization such as IEEE or ISO [20]. Such standards are developed under rules of consensus in an open forum where everyone has the chance to participate [59]. Examples GeoTIFF [84], TCP/IP, Postcript. CSMA/CD (Ethernet), SQL, ISO X.11. Drawbacks Lack of backup by an industrial agreement [97]. Lengthy formal process sometimes leads to solution trailing leading technologies [88]. Discussion - From the vendors' perspective, competitors each try to establish and advertise their technologies as the defacto standard for the field, even if it sometimes means a "premature commoditization" of their technology [9]. Although unappealing, this rush is necessary since by waiting, vendors risk the defacto becoming a competitor's technology. - Eventually when market penetration is high, a public body may review the de facto technology and accept it as a standard (e.g., TCP/IP, Xerox Ethernet) [88]. - In the 1960s and 1970s, the abuse of de facto standards has lead to the creation of more formal standards programs that promised to free users from uncomfortable vendor dependencies [59]. - Trying to design anticipatory standards or future technologies is a lengthy process since details can be argued for years [19]. This process involves a considerable amount of negotiation between parties that often have their own interests to pursue. - Frequently impeded by a wait-and-see policy whereby vendors wait until there's a broad-enough support for the standard before implementing it. For example, while the ISO X.400 standard was slowly being adopted by the industry, SMTP had already become the de facto standard for e-mail [20]. - In some cases, vendors augment the standards through unique additions that add value to the users of their proprietary technologies [9]. E.g., the "flavors" of html available as a result of the competition between Microsoft and Netscape. Table 2.3 : De Facto versus De Jure standards. 30 2. The timing of standards: Correct timing of the development and release of a standard has a major influence on its success and acceptance. As shown in Table 2.4, standards tend to lose some effectiveness when they are released too early or too late. Early Standardization Late Standardization ± ______________________________________________________________________________ - If a solution occurs too soon (e.g., Graphic Kernel System [97]), there is the risk of delaying searches for better solutions, distracting innovation in the field [33], or freezing technologies to the point of making them obsolete [19]. - An early solution often leads to a premature commitment to detail when it might be too early to distinguish between the nice-to-have from the needto-have features [71]. - If a solution comes out too late (e.g., Asynchronous Transfer Mode [97]), the acceptance of the standard is usually delayed as users are reluctant to use the standard right away, having already implemented solutions of their own [97]. This also leads to a confusion in the market in terms of coping with the various implemented solutions [71]. - Late standardization is either a result of a lengthy development and consensus process [33], or a result of a community waiting for the "best" solution. In either case, the technology misses its market [71]. Table 2.4: Timing of standards. 3. The standardization level: Depending on the problem, standardization can occur at any one of many technical levels, as shown in Table 2.5. Standardization Level Description Data (or data transfer) standard - Heterogeneous systems can freely exchange data by either having each system understand the other systems' formats or by following a data or interchange standard [47]. E.g, html as a document transfer standard for the Internet. - Complex translation methods between data formats (mostly performed as off-line batch-oriented processes) are often lossy [34], and result in duplication and redundancy of data, not to mention the big pool of needed translators [44]. - It's usually hard for an industry to move to a single data standard. Hence, standards at this level tend to take more time to develop and be adopted than other standards [22]. Interface standard for data access and processing - This implies the adherence to a single common interface through which heterogeneous resources and services can be accessed [100]. The focus here is on protocol interfaces that allow access to data and operations regardless of underlying implementations. E.g., TCP/IP. - Being at a higher level of abstraction, the interface standard is vendor-independent (in terms of data format) and hence is not tied to a proprietary solution [59]. An interface standard is often coupled with a definition of an abstract data model, defining a meaningful domain-dependent subset of data types needed. Metadata standard for data descriptions - This involves defining standards for structured data descriptions that allow users to consistently query data sources about their underlying properties. Such a standard is often used when it is unlikely to converge to a certain data standard in a particular field [34]. E.g., DIF (Data Interchange Format), satellite imagery metadata defined by NASA data centers [97]. Table 2.5 : Levels of standardization. 31 4. The extensibility and scope of the standard: Extensibility and scope are probably the most critical attributes characterizing a standard in addition to being the most challenging to realize and calibrate. They both impact the success and use of a standard in the long term. A certain degree of extensibility is desired in a standard to avoid the risk of having both the standard and the technologies using it become rapidly obsolete. However, incorporating extensibility automatically imposes the challenge of factoring-in and accommodating (today) the possible requirements and technologies of tomorrow[59]. As for the scope of a standard, it is usually determined by a balance between the number or breadth of functionalities needed and their depth [71]. Depending on the situation, depth is sometimes compromised to gain a wider acceptance and ease of use. As an example, consider the html language: Its design as a simple and easy-to-use document transfer standard for the web gave it wide acceptance but considerably limited its functionality. This limitation has pushed for the need for more sophisticated standards at the abstract task building and data tagging levels, such as Java and XML. 5. The acceptance of the standard: Acceptance or conformance to a standard by a critical mass is key. Simply stated, the true test of a standard is in how well it is accepted and how well it meets the evolving needs of both vendors and users. This of course depends on the quality and timing of the standard, as well as on the availability of incentives offered to encourage its use [96]. In summary, A "good" standard does not only have to be consistent, expressive, simple, comprehensive and extensible, it also needs to be available at the right time and be accepted by users and vendors [98]. Achieving this objective naturally requires many compromises as the listed characteristics are often conflicting. For instance, expressiveness and consistency often can only be achieved at the expense of simplicity. The same applies to the completeness of a standard versus its extensibility. Striking the right balance among these characteristics is a major challenge in the process of defining a standard, in addition to the difficulties associated with creating specifications of high technical quality, determining the functionalities to be included, and accommodating most parties' interests [59]. In the Internet age, these challenges are even further accentuated as the environment's global reach and requirements can only make it harder to agree on a common denominator, while the rapid pace of today's technological advances adds to the timing and extensibility challenges. We conclude from the previous discussion that it will be almost impossible to find a "one size fits all" standard. Accordingly, we expect that each setting will be unique. Given the essential role standards have played in GIS, it is not surprising that there are ongoing GIS-related interoperability and standardization efforts. In the next section, we look at some ongoing efforts in order to help better position this study (and later its findings) with respect to these efforts. 32 2.3 Ongoing Standardization Efforts Interoperability and standardization are increasingly being considered as hot topics on the GIS agenda. This section presents a subset of the efforts currently underway to solve parts of the GIS interoperability and standardization problem. The subset selected is used to illustrate the various levels at which solutions are being proposed. Naturally, these efforts are dependent on other IT standardization efforts (not covered here), such as those related to databases, network protocols and data formats. 2.3.1 Early standardization efforts Early GIS standardization efforts attempted to solve the problem of data sharing and distribution by standardizing data exchange formats. Some efforts focused on the importing and exporting data through common formats such as the TIGER format, while other efforts involved introducing vendor-independent formats such as the DLG (Digital Line Graph), DRG (Digital Raster Graph), DEM (Digital Elevation Model), DOQ (Digital Orthophoto Quadrangle) formats developed by USGS [97]. Although successful, the usefulness of these formats is limited to data exchange at the file level only. Furthermore, these latter approaches were heavily criticized for not representing the conceptual data model of the data, in terms of useful object-attribute relationships, spatial reference systems and other metadata elements [5]. 2.3.2 Spatial Data Transfer Standard (SDTS) The Spatial Data Transfer Standard was the answer proposed to finding a more complete solution on a federal level. SDTS is a vendor-independent intermediary data exchange format following the general purpose data exchange format defined in ISO 8211 [97]. As such, it was carefully designed to represent the varying levels of data abstraction perceived as necessary for a meaningful data exchange. The work on SDTS was initiated in 1982 by the National Committee for Digital Cartographic Data Standards (NCDCDS) consisting of interested parties from the private industry, governmental agencies and academic institutions. In the late 1980's, SDTS was created after the merging of the NCDCDS and the Federal Interagency Coordinating Committee on Digital Cartography. In 1992, SDTS was accepted as a Federal Information Processing Standard (FIPS 173). Currently, the Federal Geographic Data Committee (FGDC) 1 , established in 1990 by the US 1. http://www.fgdc.gov 33 Office of Management and Budget for promoting and coordinating national digital mapping activities, is the organization responsible overseeing the approval process of SDTS [5]. Despite its appeal and ability to serve a fundamental need, SDTS did not receive the big market acceptance that was hoped for at the time of its design. Unfortunately, the attempt to make SDTS as comprehensive as possible also made it very complicated, hence hindering its wide use and acceptance [95]. It also lacked proper supporting educational material, thus requiring significant investment to build and maintain the expertise and software within an organization using it. Additionally, SDTS exhibited extensibility problems when it was not able to effectively respond to un-anticipated requirements such as support for value-added extensions by users, and harmonization of metadata content with emerging international standards [5]. Most of all, SDTS occurred somewhat too late. By the time it was introduced, vendor standards were already in wide use satisfying considerably large groups of users. Since 1994, SDTS has experienced increased interest as a result of a presidential order to start the National Spatial Data Infrastructure, mandating federal governmental agencies producing spatial data to adopt a universal standard for facilitating data transfer and reuse [95]. The establishment of the Open GIS Consortium has also had a similar effect on the interest in SDTS [74]. 2.3.3 National Spatial Data Infrastructure (NSDI) The NSDI is "an umbrella of policies, standards, and procedures under which organizations and technologies interact to foster more efficient use, management and production of geospatial data. It includes the technology, policies, standards, and human resources necessary to acquire, process, store, distribute and improve utilization of spatial data" (Executive Office of the President, 1994). The infrastructure was created to address the needs of federal agencies faced with lower budgets and stronger demand for better quality geographic information. It was also a response to organize and share the increasing amounts of spatial data independently collected by these agencies (which have benefited from cheaper technologies and advances in digital technology) [76]. NSDI was first initiated in 1993 by the Mapping Science Committee of the National Research Council. In 1994, president Clinton issued Executive Order 12906 "Coordinating Data and Acquisition Access: The National Spatial Data Infrastructure", which initiated the execution of the implementation of the NSDI, and assigned FGDC the task of leading its development [96]. The goals of the NSDI involved creating: 1. The National Geospatial Data Clearinghouse, a decentralized network of Internet sites that provides access to metadata of available geographic information [34]. FGDC adopted the Z39.50 34 standard' and developed a metadata content standard for searching and retrieving information from the clearinghouse. 2. The National Digital Geospatial Data Framework, which consists of a set of commonly used layers of geographic information, and defines approaches for sharing data production responsibilities. 3. Data sharing standards, such as the Content Standard for Geospatial Metadata and the SDTS. Although it is a great first step towards a unified infrastructure for geographic information, some consider the NSDI to be vulnerable to political and funding conflicts, by virtue of being lead by a government agency. Critics are also frustrated by its slow progress [53]. Increasingly, the NSDI has been criticized for focusing on the production and management of spatial data instead of the development of more effective data dissemination processes, especially in light of the growing information disseminating role of the Internet. Given that it was initiated in 1993/1994 when the web had not yet taken off, the NSDI did not anticipate, let alone take advantage of the impact and potential of this new medium, where it becomes more critical to find efficient ways for identifying and locating the needed distributed geographic information [76]. 2.3.4 Open GIS Consortium (OGC) The Open GIS Consortium is a consensus-based association founded in 1994 to address the interoperability problem in current GIS systems, and to create GIS standards that follow mainstream IT trends. The consortium's current membership count is over 200 organizations, including major GIS vendors, computer companies, system integrators, academic institutions as well as various public organizations. As such, the organization is an interesting case of the "coopetition" model, where competitors collaborate in creating value through interoperability and compete in dividing the user base and services [14]. Coopetition is known to yield a faster overall market growth as it bypasses prolonged periods of shaking out competing products. With such a broad support from the community, OGC is currently a major force in the trend of GIS openness, leading the efforts of creating open interfaces and common tools that enable communication and exchange of data and services among GIS systems. The OGC has been working primarily on two interface specifications 1. A specification for a common object geodata model which provides a common language for the geoprocessing community by defining interfaces for geodata modeling. 2. A specification for a common object services model which defines interfaces for common geodata services needed to work with GIS information in a distributed environment. For more information about these specifications, refer to the Open GIS guide at http://www.opengis.org. 1. Also adopted by the Canadian Earth Observation Network (CEONet) [69]. 35 Of particular interest to us is an active group within the OGC known as the Web Mapping Special Interest Group. Acknowledging the rapid growth of the Internet and its role in geographic data and services dissemination, the group was formed in June 1997, with the mission of defining services that allow web-based mapping clients to simultaneously access and directly assemble maps from multiple servers using Internet protocols. Their focus is on "achieving interoperability of map assembly and delivery middleware within a heterogeneous distributed computing environment" [78]. In the form of organized testbeds, the group provides a forum for web mapping technology providers and users to determine standard interface specifications that will facilitate web-based mapping. After the first web mapping testbed (WMT 1), the group has published three server specifications of interest to us: 1. The Map request, sent in the form of a cgi call to the server, takes the geographic bounding box of an area, its desired picture pixel width and height as specified by a client (among other parameters), and returns the desired area at the right scale in either a picture or graphic format. 2. The Capabilities request allows users to query a server for its capabilities, its services and the layers it serves. Appendix B discussed the Map and Capabilities requests in more detail. 3. The FeatureInfo request allows users to ask about information on particular features on a map retrieved using the Map request.This request is still geared towards the picture case and is less defined in the graphic element and feature cases. WMT1 focused on covering the basic process of providing information requested by users in the form of an overlay of transparent images fetched from multiple map services compliant with the above specifications. Although suitable for viewing and simple querying of geographic information, such an approach was quickly found not to be fit for more sophisticated uses, where the raw underlying GIS data is needed. For example, there are cases where users need to have more control over specific features of the data as well as its display characteristics in terms of styling, colors, bands among other attributes. In the second phase of the testbed (IP2000), the focus consequently shifted to providing users with more control over the data requested, and giving them the option of fetching data as features and coverages, as opposed to image snapshots. At the time of this writing, the IP2000 workgroups are working on specifications for legend retrieval, coverage and feature interfaces as well as security and other e-commerce services (for more information on these efforts, we refer the reader to the OGC web page). The OGC web mapping testbeds provide us with a rich set of examples and specifications that are directly related to our work. As noted in Section 1.3, we will comment on some of the efforts of the WMT and the OGC, and present our views on the next round of issues that may be addressed in their context. 36 Finally, it is important to point out that the OGC efforts presented here are not entirely independent of the other efforts discussed in this chapter. In many ways, all these efforts complement each other and their respective organizations recognize that they have common basic objectives. For instance, the FGDC is an active sponsor in the OGC web mapping testbeds. Furthermore, both the OGC and the FGDC have worked together in committees and workshops related to the Digital Earth Initiative' [74]. The OGC is also involved in international standards working groups, such as the ISO/TC 211, and works closely with the ANSI X3L1-2 (a committee that works on spatial extensions to SQL). Finally, with areas that are still under development, the task of reaching consensus and the necessity of being responsive to changes in the environment become even more difficult. Accentuated by the complex nature of GIS and the dramatic change in the conceptualization of GIS systems and components, the consensus-based OGC process can be sometimes lengthy [34]. Moreover, the presence of many competitors in the organization sometimes undermines the process through political compromises as well as contradictory and counterproductive pushes for technologies that serve individual members' own interests best [69]. 2.3.5 World Wide Web Consortium (W3C) The W3C is a global organization comprised of companies, non-profit organizations, industry groups and government agencies. Recognizing the many advantages of representing graphic sent to browsers in vector rather than raster formats, the W3C has been increasingly interested in standardizing the format and structure of vector data. The advantages of vector formats stem from the considerably smaller size of a vector data representation compared to an equivalent raster one. The smaller size of some vector data makes it more fit to be transmitted over a network. The vector data also allows for more interactivity as it retains the object nature of its constituent elements and scalable information. It is also more easily displayable in a wide range of resolutions without loss of quality [109]. These points are particularly important if we consider the emerging need to provide GIS services for mobile users, whose data connectivity is typically bandwidth-limited. The standardization of the vector format will be highly beneficial to the GIS community, since many common vector formats are proprietary, inhibiting the efficient exchange of this data and affecting its manipulation and display in a browser. In this context, the role of the W3C is to pro- 1. The Digital Earth Initiative is an ad-hoc inter-agency working group formed to define the US federal participation in Digital Earth. The latter is defined as a virtual representation of the planet that enables users to explore and interact with vast amounts of natural and cultural information gathered about the Earth (http://digitalearth.gsfc.nasa.gov). This initiative naturally requires a certain level of interoperability and standardization of spatial data, metadata and services. 37 duce a specification for a scalable vector graphics format, written as a module XML namespace [109]. XML is favored because it is extensible and text-based, and hence facilitates the tasks of exchanging and reading vector data. Towards the goal of standardizing the vector format, the W3C formed the Scalable Vector Graphics (SVG) working group in August of 1998, to design a common vocabulary (DTD) for scalable vector data. Several proposals have been submitted for SVG compliance, including PGML (Precision Graphics Markup Language), WebCGM (Web Computer Graphics Metafile), and VML (Vector Markup Language) [49]. In a way, these vector standardization efforts complement the efforts of the web mapping group, whereby the work on the GIS servers' interfaces would not be complete without a standard for the data returned from these servers. 2.3.6 Other Efforts Before concluding this section on ongoing GIS interoperability efforts, it is important to highlight some of the international efforts. One such effort is driven by the ISO/TC 211 committee, which includes members such as the European Petroleum Survey Group (EPSG), the International Society of Photogrammetry and Remote Sensing (ISPRS), the Japan National Spatial Data Infrastructure Promoting Association, Europe's Joint Research Center and the Open GIS Consortium. This committee works on emerging international standards for digital geographic information such as the ISO/TC 211 Geographic Information Geomatics standard. The ISO/TC 211 has however been criticized for focusing only on data rather than on practical implementation aspects [97]. The Canadian Geospatial Data Infrastructure (CGDI, www.geoconnections.org) is another independent effort that tackles the problem of accessing heterogeneous and distributed spatial data in an interoperable fashion [44]. Products developed as a result of this initiative include 1. The Open Geospatial Datastore Interface (OGDI), an API that sits between an application and a geospatial data product to provide access to that data. The drawback of this method is that a separate driver needs to be developed for each data format, for the API to be able to call it. 2. The Geographic Library Transfer Protocol (GLTP), a stateful protocol (in contrast to the stateless protocols developed by OGC) that retains knowledge of all transactions pertaining to a connection, hence allowing for more efficient processing of successive related queries or transfers of geospatial data. Another effort that is worth mentioning in this section involves the .geo proposal made, at the time of this writing, to the Internet Corporation of Assigned Names and Numbers (IACNN) by SRI, a california-based company. SRI proposed a location-based top level domain (.geo) which supports geo-spatially indexing and referencing of data available on the Internet. Briefly, the pro- 38 posal suggests assigning unique domain names for grid cells (of different scales) that would cover regions of the earth. Each geographic cell is assigned a cell server, which is responsible for storing and responding to queries for geodata that lie within its cell boundary. Cell servers would be maintained by organizations called geo-registries, which can charge users for registering their data with them. Such a setup would allow users to search for data by location via geo-enabled search engines. The details of the proposal are available at http://www.dotgeo.org. Although intriguing and innovative, the proposal requires many issues to be resolved at both the technical and the regulatory levels. It was criticized by some for proposing a fee-based registration mechanism, for adding yet another level of complexity to the Internet, and for restricting access to the public by bureaucratizing geospatial data [91]. Consequently, the proposal was turned down by ICANN. Given the utility of searching for data according to its geographical location, we believe that, despite these issues, the technical and regulatory details will be eventually be worked out, leading to a consistent scheme of geo-referencing information on the Internet. Such a scheme would be very useful in the distributed geo-processing infrastructure, especially in the context of some of the architectural setups discussed in Chapter 4. In the next two sections, we move to present two distributed models for delivering and accessing services over networks. The goal will be to identify and anticipate, through an understanding of these models, the issues that are likely to arise in the distributed geo-processing infrastructure. 2.4 The Internet The Internet is certainly the most talked about computer-related phenomena of the nineties. It is presented in this thesis as a powerful case of a successful distributed environment. In including a discussion about the Internet here, we are interested in first extracting the key factors that contributed to its wide proliferation and success at the technological, architectural and organizational levels. Then, we want to identify those factors that are most likely to apply to the case of a distributed geo-processing infrastructure. In addition, we hope to gain some insight into what makes the distributed geo-processing infrastructure unique. 2.4.1 Identifying Key Success Factors This section summarizes our understanding of how the Internet, a loosely-organized international collaboration of autonomous, interconnected heterogeneous sub-networks, can provide the 39 illusion of being a single homogeneous network [20], and how this federation of autonomous systems can cooperate for mutual benefit to form such a scalable environment [88]. The key factors that contributed to the success of this phenomena, and that are relevant to our discussion of a distributed geo-processing infrastructure are: " Key factor 1: Simplicity and tolerance of underlying standards - Key factor 2: Consensus for consistency " Key factor 3: Extensibility and evolution - Key factor 1: Simplicity and tolerance of underlying standards Standards constitute the foundations of the Internet. They are the glue that keeps computers and applications working in harmony in such a global decentralized environment [69]. We argue that it is the simplicity and the tolerance of these standards that have enabled the rapid growth of the Internet. Consider, for instance, http, the underlying communication protocol which governs the order of conversations exchanged between clients and servers [63]. The http protocol was designed with network efficiency and scalability in mind1 . Its simplicity and tolerance partially derive from its stateless nature, which considerably simplifies its request/response operations. Similarly, the Internet's most common document transfer standard, html, played an important role in the proliferation of the web. Html's rapid adoption was due to its ease of construction given the availability of various authoring tools, and its ease of use, given the simplicity of browser usage. In both examples (http, html), compliance to the standards is enforced operationally rather than through a formal process: Web pages or web servers need not be certified by any formal authority, or registered before use. Instead, a document is considered conforming to the html standard if it can be displayed in a rather consistent manner across available browsers. Similarly, a web servers is accessible if it does not return an error at the time of a request. The resultant tolerance of the environment simplifies the design of clients and servers, and is therefore a major driver in the proliferation of the Internet. - Key factor 2: Consensus for consistency Despite the diversity of providers and systems, what brings the Internet together and maintains its wide connectivity, is a consistent naming of URLs as well as a consistent assignment of unique identifiers for IP addressing and domain names. Because these assignments have to be unique, 1. http, in use since 1990, is based on a request/response paradigm. A client establishes a connection with a server and sends it a request in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content. The server responds with a status line, including the message's protocol version and a success or error code, followed by a MIME-like message containing server information, entity meta-information, and possible body content (for more details refer to http://www.w3.org/Protocols/). 40 they are coordinated by the Internet Assigned Numbers Authority (IANA), and the Internet Network Information Center (InterNIC) which lead the organizations responsible for assigning IP addresses and domain names respectively. IANA and InterNIC belong to a larger group of organizations that maintain Internet consistency and standards. Table 2.6, which lists the most visible organizations, shows that the coordination and cooperation needed to maintain the Internet connectivity is happening without having a central coordinating authority. Instead, consistency is maintained by a handful of organizations, each influential in its own right. Collectively, these organizations form a checks-and-balances system for the related efforts [68]. Figure 2.3 summarizes key factors 1 and 2. Organization Focus World Wide Web Consortium (W3C) Sets data exchange standards for html, XML, etc. Internet Engineering Task Force (IETF) A volunteer organization, focuses on the evolution of the Internet and on keeping the it running as smoothly as a whole. It covers developing Internet protocols and standardizing what has already been developed [69]. Internet Architecture Board (IAB) Responsible for defining the overall architecture of the Internet (backbones, networks, etc.). Also keeps track of various numbers that must remain unique such as the 32 bit IP addresses [69]. Internet Society (ISOC) Supervisory organization that comments on Internet policies and practices and oversees other boards and task forces including the IAB and the IETF. Internet Assigned Numbers Authority (IANA) Leads the organizations responsible for assigning IP addresses. Internet Network Information Center (InterNIC) Leads the organizations responsible for assigning domain names. Table 2.6 : Organizations involved in Internet coordination and standardization. 41 2. Document transfer standard html: simple to use (browsers) and learn. 3. Consistency: consistent naming of URLs, unique identifiers for IP addressing, and domain names. Client Server 1. Communication protocol http: simple, stateless, tolerant. Designed for network efficiency and scalability in mind. Figure 2.3 Simplicity of Internet standards and consensus for consistency. Key factor 3: Extensibility and evolution It is interesting to understand how a distributed environment like the Internet and its technologies gradually evolve to accommodate an exponentially growing pool of users and information [40]. An example of such an evolution was triggered by the increasing difficulty of locating information in the growing web, to which the market responded by developing new tools that assist users in categorizing and searching for information. Such tools include search engines, web crawlers, web directories and meta-search sites [37]. Similarly, as Internet usage grew, the demands of its users broadened. They needed more customization, more functionality, and more control over data flow. Html alone could no longer fulfill the users' needs. New technologies were introduced to target these new requirements: web servers were extended via back-end gateways [20] and more sophisticated scripting languages (such as JavaScript and Active Server Pages) were developed. In this context, Java delivered a platformindependent expansion of functionality and XMIL provided a standard extensible way of exchanging information along with its semantics and structure. Gradually, other technologies were built on top of those mentioned, such as the Java-based distributed computing platform, JINI, and the XML-related technologies of XSLT and XML-RPC among others. These technologies have followed the evolution of the Internet from its beginnings as a simple network to what it is today, the foundation of today's e-commerce as well as many of today's enterprise computing environments. 2.4.2 Implications Even though it remains to be seen how challenges such as overcoming possible information overloads and security risks will be addressed on the Internet [20], there are still some key lessons that are worth exporting to the GIS paradigm. One conclusion from our brief Internet study is that 42 the success of the Internet has been as much about the technology standards as the culture of interoperability embraced by its users, and their shared belief in the importance of interoperability in such a medium. The embrace of interoperability will be equally critical for the success of geoprocessing. As discussed in the previous section, the Internet does not provide a "one size fits all" solution to every user. Instead, it is supported by a range of technologies targeting different needs. It is up to the users to pick and combine the technologies they need to create solutions that best fit their needs and constraints. This observation implies that, in the case of the geo-processing infrastructure, we should not aim to find the one optimal configuration that will work for this domain. Instead, the goal should be to identify both the constituent elements that can be combined, and the issues that need to be addressed in order to provide the users with the flexibility of assembling their own solutions. In addition to the above conclusions, this study highlighted the importance of having simple, consistent and tolerant underlying technologies that can support a scalable distributed infrastruture. Operational compliance to standards as well as the distribution of responsibilities across cooperating organizations were also identified as key factors in maintaining the infrastructure and its growth without resorting to centralized controlling authorities. The value of this study and its implications lie in providing us with an initial set of relevant technologies, requirements and issues to consider. In our subsequent discussion of the distributed geo-processing infrastructure, we will need to determine whether these requirements and issues are applicable to it. We will also need to examine whether some Internet technologies (such as those related to security, e-commerce, authorization, etc.) are generic enough to be directly used, in GIS, or whether GIS characteristics will require specialized solutions. 2.5 Application Service Providers (ASP) A discussion on Application Service Providers is included in this chapter as an example of a commercially available practice of delivering functionalities to users as Internet-based services. We hope to use some lessons and strategies from the ASP model, as we expect it to have similarities with the distributed geo-processing infrastructure, both in terms of key drivers and challenges induced by a changing IT environment and the complexities of service integration. 43 2.5.1 Overview ASPs are independent third-party providers of software-based services which typically specialize in a slice of business computing [102], delivered to customers as packaged ready-to-use resources across a wide area network [6]. Although the ASP market only recently surged, the literature indicates that the core concept is not new, as it has been first proposed a third of a century ago by Greenberger at MIT [102]. Its recent reemergence in the IT world is occurring as a result of the convergence of today's technology and market conditions, as depicted in Figure 2.4. The evolving technologies and practices have enabled the ASP concept to prove its potential for cutting costs and providing flexibility to users, two of its many advantages (see Table 2.7). Advantages for Users Advantages for Service Providers " Flexibility: customers pick what's right for them. - Avoiding being locked-in by a certain vendor. - Faster time to benefit, no management or maintenance of software (transparent to users). - Access to otherwise out of reach technologies. - Specialization according to expertise. - Sharing the costs of expensive resources among number of customers. - Integration of services increases the value of each service. Table 2.7: Advantages of the ASP model. Business Conditions Technology Conditions * Fast pace of business. * Growing global telecommunications infrastructure based on IP technology (increasing bandwidth, lower costs). 9 Experience in services such as security (SSL, SKI), virtual private networks. * Maturity of XML as a data exchange standard. 9 Maturity of server-based computing (the technologies allowing WWW, applications to run centrally such as CGI, Java, Windows DNA). * Availability of easily configurable software that easily lends itself to online delivery as an Internet service. " Increased competition. * Prevalence of e-commerce. * Increase of on-demand computing provision. * Need for focusing on core business competencies, especially with the rise of dot-coms who can't afford to do everything internally. * Cutting costs of ownership and daily management when only simple subsets of functionalities are needed. pplication Service PrmvidW Figure 2.4 Convergence of business and technology conditions. 44 Common examples of services offered by ASPs range from back-office automation to e-commerce and customer relationship management applications. An interesting example involves the recently revealed .Net initiative of Microsoft, in which the company offers to provide its suite of software as web services on the Internet, instead of directly selling it to its clients [30]. Given increasing demands for a wealth of information and resources distributed across a network, Microsoft aims to add value by hiding, from the user, the complexity of locating this information, retrieving it and integrating with other information on various devices. XML plays a prominent part in this approach, as it does with most ASPs, because it allows maximum customization of solutions while remaining standard in terms of describing them. At the time of this writing, customers and competitors were waiting to see whether Microsoft would succeed in its "services instead of software" vision [62]. 2.5.2 Issues Before we discuss some of ASP-related issues and their relevance to the distributed geo-processing model, we note the difference between the abstraction levels at which the two models provide their solutions to users. Even though both models advocate service-centered architectures, the definition of service is different in each case. Under the ASP model, a "service" refers to indicate a hosted prepackaged application (which includes a user interface and data storage) that is ready to use by customers. In the case of the geo-processing infrastructure, a "service" is an autonomous GIS component that can be called by other applications via a standardized access protocol. Given this difference in defining services and its implications on the service provision model and its dynamics, it follows that only a subset of the ASP issues will be applicable to the GIS model. In this section, we discuss those issues we think are the most relevant for the GIS case. Scalability In any distributed system, scalability of the underlying infrastructure is critical to a sustainable growth of that system. While an integrated design for a global ASP infrastructure does not yet exist, ASP service providers are individually working on the scalability of their own service provisions, to help ensure acceptable performance when service loads are high. Towards that goal, service providers are experimenting with their internal setups of servers, fine-tuning variables such as the number and horsepower of main and redundant servers, as well as the bandwidth available between them and their clients. Other efforts are directed at maintaining clients' data integrity and enforcing security in the distributed setup. The results of the efforts related to bandwidth, data integrity and security are of particular interest to us when discussing the GIS infrastructure: the size of exchanged GIS data can be potentially large, and the challenge of maintaining data integrity 45 is considerable, given the likelihood of data transformations applied to produce data that corresponds to specific clients' requests and needs. Integration of Services Experts in the industry are still waiting to see if the single application approach will be viable in the long run. It is argued that while this approach is suitable for businesses needing just one application, it is more likely that companies are going to need more than one application at a time. If customers choose to locally integrate the independently-provided services, they risk running into several known problems. On one hand, the various services could be using different models, formats or databases, hence inhibiting interoperability among the applications. On the other hand, companies will risk losing their customization of individual services as well as that of the integrated product if services are upgraded or modified by the providers. In response to these problems, the ASP market is experiencing a surge of partnerships among service providers. These providers are interested in adding value to their services by integrating them with other complementary ones to create one-stop shops of coherent and customized applications. This approach introduces its own problems. The most noted ones revolve around concerns for accountability of service providers in the case of problems or need for customer support. Indeed it has been repeatedly reported that service providers point fingers at each other when an "integrated" service under-performs or experiences problems [82]. In order to facilitate the service integration and data transfer among ASPs, the ASP industry consortium, a non-profit organization comprised of ASP players worldwide, was formed in May '00, in an attempt to foster the development of open ASP standards and common practices [6]. It is still early to determine the degree of success of this consortium as well as the longevity outlook for the ASP approach. The ASP Market Given the interoperability trend in the market, the key question becomes that of knowing how service providers can differentiate their offerings without compromising interoperability. We can distinguish two trends that are evolving in the ASP market. On one hand, some service providers are trying to reach the widest segment of clients possible for their services by making them as generic as possible. In such a market, branding offers an edge to providers with known names, which expands their product visibility and hence increases their chances of gaining new clients. ASPs offering generic services can also increase their value by integrating applications and providing single-stop shops and point of contact for their clients. The other trend in the ASP market revolves around specialization, whereby some service providers choose to specialize in providing 46 services for specific domains and industries. These ASPs produce highly customized products for their targeted niche of customers. It is clear that the business models in this market are still evolving and so are the methods used to charge clients for used services. ASPs are also still experimenting on that front, whereby some are charging customers per individual service use or per data size transferred or processed, while others are using a simple flat periodical fee structures for their services [3]. 2.5.3 Implications Although the ASP approach is relatively young with a yet unproven business model and many of its complexities not yet fully resolved, it still provides us with some insights into the issues and dynamics of the distributed geo-processing infrastructure. Some of these issues are summarized below. In terms of software design issues, we saw that a service-centered model requires a shift in the engineering and deployment of software applications. This shift to "service engineering" clearly puts legacy system providers at a disadvantage. This explains why software giants such as Microsoft (and ESRI in the case of GIS) have initially been slow at aggressively developing software solutions for thin client computing. Such big players are also justifiably concerned about the impact of the service model on the sustainability of their businesses and profits. However, it seems rather unlikely that the service model will replace software as it is currently defined, even in the long run. Although there are specific applications and environments where it is effective, it will not always be the most appropriate route for delivering applications to users. We believe that this observation also strongly applies to the distributed geo-processing infrastructure. In terms of technologies, the discussion of ASP issues also highlights the growing role of XML as a foundation for interoperability and data exchange, a role that is easily transferable to the geo-processing infrastructure. Similarly, technologies developed and used for maintaining security, data integrity and user authentication in the ASP field can also be adopted in the GIS case. Throughout our discussion, we will highlight any factors that are unique to a distributed GIS solution, leading to a variation in the treatment of these issues. As for market dynamics, the ASP study has raised the issues of commoditization versus specialization, the importance of branding in this market, the value of integrated services to clients, and the convenience of one-stop shopping for integrated services. The study hence hints at the potential value of new combinations and types of "integration" services that may sit between giant repositories of GIS data and the end user client applications in many niche markets. We will revisit 47 these issues in the conclusion, when we discuss the dynamics of the distributed geo-processing marketplace. To conclude, this chapter provided us with a glimpse of the issues that are likely to play out in the discussion of architectures supporting a distributed geo-processing infrastructure. Some of these issues are refined in the next chapter, where we describe a prototyping experiment that sheds some light on other issues that are more practical in nature. 48 Chapter 3 Case Study: Raster Image Re-projection Service Prototype 3.1 Introduction An integral part of our research methodology consists of a preliminary identification of key choices and issues through a carefully selected prototyping experiment. The rationale behind the prototyping strategy is that the experience can help reinforce and/or identify issues that were not necessarily unanticipated, but were nonetheless hard to conceptualize without actually trying things in an operational setting (Refer to Section 1.4.2). In this chapter, we report on our attempt to design a preliminary interface for raster image reprojection services, and our experience with building a service complying to the designed interface. It is important to clarify that our objective is not that of developing an optimal raster image re-projection engine. Instead, the prototyping effort is for the sole purpose of gaining a deeper understanding of the options and issues associated with the distributed geo-processing infrastructure problem. In particular, in our presentation, we consistently highlight the unexpected difficulties faced in both the design and implementation phases, arising from the lack of GIS interoperability and the limitations of legacy GIS systems in accommodating the distribution and scalability requirements. We then translate the highlighted difficulties into specific characteristics for interfaces and services, needed to ensure a scalable and reliable distributed GIS infrastructure, especially in the area of service integration and chaining. We begin by providing the reader with an overview of the raster re-projection process, and the characteristics that make it an appealing service to experiment with and learn from. 49 3.2 Re-projection: Overview and Motivation 3.2.1 Overview This section is not intended to provide a detailed presentation of re-projection but rather to familiarize the reader with its concept and complexities. For more information, the reader is referred to the projection and re-projection references included in the bibliography. Re-projection is the process of transforming an image from one projection system to another. Projectionsi are methods of systematically representing the three dimensional information from the curved surface of the Earth in two dimensions. A set of mathematical equations is used to convert from/to latitude and longitude coordinates to planar ones. Since any spheroid representing the Earth cannot be transformed to a plane without distortion, other surfaces are used for projecting the globe, such as the cylinder, the cone and the plane (as shown in Appendix C. 1). Depending on the surface selected to approximate the spherical Earth, distortions to the area, shape and/or scale of the projected image are introduced. Projections have therefore been classified according to the characteristics of the map they preserve (summarized in Appendix C.2). Depending on the image size, shape and location, users can pick a projection that minimizes the distortions for their purposes. The procedure for re-projecting maps from one projection to another differs depending on whether the map is in vector format or raster format. For vector data, the process is simple: for each point on the map, the proper equations are applied to first obtain the equivalent latitude/longitude location values of that point. Next, these values are transformed into coordinates in the target projection. For raster data, the process involves an additional level of complexity due to the structure of raster imagery as arrays of pixels with varying intensity values. Re-projection in the raster case involves transforming one Cartesian grid to another. For each cell (x,y) in the newly projected image, the corresponding coordinate (xo,yo) in the original raster image must be calculated. As shown in Figure 3.1, problems arise when the cell (x,y) does not coincide with a cell (xo,yo) in the original image. Interpolation or resampling around (xo,yo) is then needed to estimate the pixel brightness value at (x,y) [85]. Appendix C.3 summarizes the three most commonly used interpolation/resampling methods available: the nearest neighbor, the bilinear and the cubic interpolation methods. 1. Interestingly, projections have been the topic of discussions in thousands of cartography books and papers dating back to 150 A.D. [89]. 50 New Image Original Image Figure 3.1 Raster imagery re-projection. This extra interpolation/resampling step in the case of raster imagery makes the raster re-projection process computationally intensive and slow. Depending on the uses of the imagery, appropriate approximation methods may be used to bypass this step and speed the re-projection. However, this gain in speed comes at the expense of a loss in data quality, creating an interesting trade-off which we explore later. 3.2.2 Motivation: Reasons for Picking Re-projection Raster image re-projection is a relevant service to study and prototype for several reasons. Undoubtedly, the growing volume of raster imagery becoming available in different projection systems (Section 2.1.3) is heightening the need for a specialized service to transform these images from their native coordinate systems to those defined by users. This need has also recently become apparent in the context of the first phase of the OGC Web Mapping Testbed (See "Open GIS Consortium (OGC)" on page 35). Essentially, the testbed main demo consisted of overlaying several maps fetched (according to the WMT specifications listed in Appendix B) from different map servers in the coordinate system chosen by the user. This step turned out to be more difficult than originally expected because, in most cases, only a few servers at a time provided maps in the chosen coordinate system. In order to make the demo more effective, servers needed to be able to re-project their maps into the chosen coordinate system. For the purposes of the WMT testbed, participants solved the re-projection problem in different ways: some converted and stored versions of their maps in the expected coordinate systems, while others developed local re-projection modules to re-project their data as well 51 as others' before returning the maps to the client. In both cases, the re-projection was transparent to the end user, who had no control over the process or its parameters. The WMT demo highlighted the increasingly pressing need for having a generic re-projection service that can be transparently called by other services on the network, and selectively used as an intermediate step in the delivery of customized maps. In this context, the true value of such a reprojection service stems from its indispensable role in service chains. Given this role, the design and implementation of a prototype for such a re-projection service compel us to identify and incorporate, at an early stage, issues that are directly related to service chaining and integration in a distributed infrastructure. A further advantage of implementing a re-projection service specifically for raster imagery is that it focuses the discussion in many ways. The simple structure of geo-referenced raster imagery coupled with its richness in information allows the discussion to center around the technical and architectural aspects of service design and chaining without the added complexities of semantics. Indeed, the use of raster imagery in examples throughout this thesis helps push the issues of bandwidth, compression and performance in a distributed environment. Finally, what makes the re-projection case particularly interesting is that the seemingly simple re-projection step can readily complicate the chaining setup with the choices to be made regarding resampling methods, image geo-referencing representations and approximation methods used (as illustrated later in this chapter). This in turn highlights issues pertaining to tracking image metadata and transformations as well as finding ways of hiding such complexities from the end user. To summarize, given the need for generic re-projection services, their role in service chains and their ability to focus the discussion on relevant issues, we believe that raster image re-projection prototyping will provide us with the insights we are looking for. 3.3 Re-projection Service: Design Process and Preliminary Interface The first step of the prototyping experiment involves devising a preliminary interface for raster re-projection services. The interface is later used in the implementation and testing of a sample service. Our goal in this first step is to start with a simple re-projection interface that is independent of possible underlying implementations, and that can accommodate, as much as possible, current and expected requirements. We emphasize the interoperability and simplicity criteria behind 52 the interface as they are pre-requisites to ensuring that compliant services are easy to use (alone and in chains) and are effortlessly replaceable and substitutable by users. A summary of the design process as well as the preliminary interface are presented next, with a focus on the design decisions involved and the unexpected obstacles encountered. 3.3.1 Interface Design Process In order for the re-projection service to chain easily with other services defined in the broader setting of the OpenGIS web mapping group (Section 2.3.4), we followed a design process that is consistent with the one used by the WMT group to define the GetMap and GetCapabilities requests outlined in Appendix B. The only major difference is that our design process did not benefit from a WMT-like group/testbed environment. Such an environment can systematically inject more feedback, more input and more control over variables into the design process. The re-projection interface presented here was refined based on our own implementation experiences only. Nevertheless, it succeeds at providing the right depth at which we can explore distributed services and chaining issues, the main goal of this thesis. In summary, our design process starts with the identification of the request categories the service is expected to offer and handle. For each request, the input parameters as well as the return data type(s) are identified. For the purpose of simplicity, we use CGI to deliver the request to the server, hence specifying the inputs as pairs of (key=value) statements. Below we discuss two requests that the raster re-projection engine needs to support. The first request is ReProjlmage (described in Section 3.3.2), which accepts requests for re-projecting a geo-referenced raster image from one coordinate system to another. The second is GetCapabilities (discussed in Section 3.3.3), which returns the capabilities supported by the service (serves the same purpose as the one described in Appendix B.2). 3.3.2 Image Re-projection Interface The ReprojImage request is used to transform a raster geo-referenced input image from its original reference system to a another reference system. While working on the presented interface, we tried work with current state-of-the-art terminologies and formats, instead of introducing new formats for projection, image or geo-referencing information. Although restricting, this approach allowed us to identify some of the limitations of these state-of-the-art technologies, which are discussed in detail in Section 3.5.1. Below we describe the minimal set of inputs needed for a 53 ReProjImage request, commenting on the major choices and decisions made in our design process. A summary of the request inputs is provided in Table 3.1. - The image to be transformed: This is the most essential input element needed in a re-projection service. A simple way to make an image available to a stand-alone re-projection service on a network is to send it from its storage location to the service with the rest of the transformation parameters. This approach is however not practical given that, as described earlier, the image will then have to be embedded within a CGI request. A more practical approach is to provide the reprojection service with the location from which it can retrieve the image. In an Internet setup, the location of the image would be encoded in the form of a URL. Using a URL has several advantages: it makes the chaining of a re-projection service more flexible as the URL is not limited to pointing to an image that is stored on disk. It can also embed a request to a map server to construct an image according to certain specifications. Hence in a chain, instead of preparing a raster image and sending it to the ReProjImage service for re-projection, one can simply send the transformation parameters along with a request to a map server to construct the image. In this setting, using a URL in the ReProjImage request minimizes the transfer of images between services and does not constrain the service to working with "finished" maps. The URL parameter also makes the re-projection service more efficient as it can request image constructions as they are needed in the process. This point is particularly important since it ensures "closure" amongst services. In other words, the re-projection service need not differentiate between a URL that contains a request for an image to be constructed or one that directly points to an existing image on a server. These two cases are treated identically and the difference is totally transparent to the re-projection service. This "closure" property will be revisited again with examples in Chapter 4. - The data format of the image: Given that the re-projection service cannot differentiate between a finished image (with a known image extension such as http://tull.mit.edu/test.jpg) and one that needs to be constructed by another server (such as http://tull.mit.edu/ REQUEST=map&BBOX=-97,24,78,36&WIDTH=560&HEIGHT=350&FORMAT=FPG),there needs to be a way to explicitly communicate the format of that image to the re-projection service. This data format parameter allows the server to know the type of the image's geo-referencing information, whether it is embedded in the image itself such as in the case of GeoTIFF image, or requires additional inputs to be specified. The alternative would be for the re-projection service to identify the image format by parsing the input image URL. This alternative is however neither scalable nor interoperable as it requires services to be able to parse and interpret each others' protocols. - The geo-referencing information: If the geo-referencing information of an image is not encoded within the image itself, it is usually saved in a separate text file created specifically for that purpose. Unfortunately, the formats of these files are currently vendor-dependent. For example, MapInfo uses a .tab format to describe the geo-referencing information of maps while ArcView uses the world file format (.tfw,.tgw,.wld,.jgw etc.). The multitude of formats available for representing this information poses a problem for the extensibility of our ReProjImage interface, and emphasizes the need for a standard for describing this geo-referencing information and associating it with its source raster imagery. Given the unavailability of such a standard, a re-projection service has to be able to parse and interpret several kinds of popular formats. Consequently, in order to understand the geo-referencing information associated with an image, two pieces of information are needed: the format of this information, and the information itself. 54 When the geo-referencing information is specified in a separate file, there are several methods for sending this file or its contents to the re-projection service. For example, one can send the URL of the geo-referencing information file to the server along with the image. In this case, the server makes two requests one for the image itself and one for its geo-referencing information. The advantage of this approach is that it is consistent with the one selected for the image element of the request and can also be constructed on demand with its associated image. However, this approach still requires the server to understand and manipulate several formats of these geo-referencing files. Alternatively, one can extract the needed information from the file and send it using newly defined keys. This approach in fact calls for standardizing the way the geo-referencing information is communicated among servers. The new tags or keys consist of pairs of pixel locations and their corresponding geographic coordinates. The original and target reference systems: Specifying the values for the reference systems is a controversial issue. Although it is easier for the users to define them as abbreviated strings (such as NAD83), currently there is no standard way of representing these strings. In order to avoid inconsistencies in chaining with WMT-compliant map servers, we decided to follow the WMT representations of reference systems, which were agreed on after an extensive debate among the group's participants. Accordingly, we will henceforth use the format namespace:projection identifier. A commonly used namespace is the EPSG 1 one, which includes tables that define numeric codes for many projections and associate projection and coordinate metadata for each identifier. Namespaces can also be user-defined or vendor-specific. - The resampling algorithm used: This parameter provides users and services with the flexibility of choosing a resampling algorithm that is appropriate for their specific needs. - The pixel width and height of resultant image: These parameters allow users and services to specify the pixel dimensions of re-projected images. An alternative way to convey the same information is to use a zoom parameter, which defines the geographic extent covered by a single pixel. - The returned image format and the geo-referencing type: These are needed when users or services request the returned image and/or its geo-referencing information in formats that are different from the originals. This presents a problem since, in response to each CGI request, the server can only return one result/file to the caller. Therefore, per CGI call, the server cannot send back the image and the geo-referencing information as separate files. One way to solve this problem is to encode these two pieces of information in one XML document. Indeed, the OpenGIS is currently involved in defining a Geographic Markup Language (GML) 2, an XML-based standard for representing geographic data. Another alternative is to use GeoTIFFs as a standard for geo-referenced raster imagery as they embed coordinate information in the header of the tif file. - A preferred return error or exception format: Consistent with the WMT approach, a standard way of communicating errors to the requesting client and consequently to the user is provided. An error is expressed either in the form of an XML document, in which the error and its severity are described, or in the same data type expected by the client (i.e., if the client requested the image in jpg format, then the user receives a jpg image with an indication of the error). Return1. At http://www.petroconsultants.com/products/geodetic.html, EPSG has compiled a large number of coordinate systems and assigned a unique integer code to each one of them. 2. The Geographic Markup Language (GML) is an XML encoding for the transport and storage of geographic information, including both the geometry and properties of geographic features. 55 ing the exception in XML format is particularly useful when services are chained because it allows a meaningful propagation of errors throughout a chain to the user. Image exceptions on the other hand are appropriate when the client is expecting to receive an image regardless of the success of the underlying operation. Depending on the design of the client and the chain of services, a suitable exception format can be selected. - Other inputs: In addition to the basic parameters described thus far, other inputs might be needed such as version numbers, user/projection specific parameters, or vendor-specific inputs that can be used to optimize the performance of a server when it is used in conjunction with a particular client. Of course, it is desirable to minimize the use of vendor-specific inputs in order to maintain interoperability of services. Within the WMT setting, these inputs provide vendors with an opportunity to experiment with additional parameters. If these prove to generally improve the design or the performance of the service, they may later be incorporated into the standard. Another miscellaneous input might be used to store the color assigned to the pixels on the border of transformed images (See examples in Appendix C.5.3). Table 3.1 summarizes the parameters to the ReProjImage interface discussed thus far. 3.3.3 GetCapabilities Request In a distributed environment, it is essential to have services follow a consistent scheme for describing their capabilities to interested clients. In the WMT context, this was accomplished using the GetCapabilities interface, which, upon request, returns to clients the characteristics of a server in an XML document. Querying servers for their capabilities through the GetCapabilities interface allows clients to decide, before using the service, whether it is suitable for their purposes. In Table 3.2, we list the characteristics most likely to be of interest to clients requesting a re-projection service. 56 Parameter & CGI name Description Values, Optional/Required Image URL (imageURL) Location of image to be transformed. Required. Image Format (format) Format of image. Required unless extension of image allows the extraction of the format. GeoRef Type (gType) Type of associated geo-referencing information. Required (values include GeoTIFF, world, tab). Georef Info (glnfo) 1. Send the URL of the geo-referencing information to the server. Required unless the image is a GeoTIFF. 2. Extract the needed information and send it using new keys/tags. These hold selected pixel locations and associated geographic information on the image. At least 3 pairs are required for proper registration of the image. Width & Height (width, height) Pixel width and height of transformed image. Required. Source SRS (fromSRS) Namespace: projection identifier. Required. Target SRS (toSRS) Namespace: projection identifier. Required. Resampling (resampling) Resampling method picked by client. Optional, defaults to nearest neighbor or approximation. Output Image Format (rFormat) Format of output image. Optional, defaults to that of input image. Output GeoRef (rGtype) Type Type of geo-referencing information of output image. Optional, defaults to that of input image. Format Can be either an XML document or same return type as successful request. Optional, defaults to the regular data type expected by the client (debated in wmt). Version number, vendor-specific, color, etc. To be determined. Exception (exception) Other Inputs Method border Table 3.1 : Preliminary raster re-projection service query parameters. 57 =_=_ N- - - - p - ~. '-'-~ -~ U - Information Discussion Input and output projections supported by This information can be supplied in one of two forms: - The server returns two lists, one for supported source projections and one for supported target projections. In this case, the assumption is that ReProjImage handles the transformations from all source projections to all target projections. - The server returns a list of pairs of the source and target projections it supports. The only disadvantage of this method is that the list of pairs might get too long when a particular service supports a large number of re-projections, which implies a higher bandwidth needed to retrieve the capabilities. service Image formats ported by service sup- This can be returned as a list of the image types supported. Geographic information types supported by service Clients need this information to determine which format to send the data in, and to determine what options are available for the return formats. Exception formats supported by service Used to indicate whether the server returns the data in XML or not. Resampling techniques supported by service Since re-projection servers are not expected to be able to implement all available resampling techniques, this piece of information is particularly useful to clients. Relative speed or accuracy at which service performs the re-projection This information might be of interest to clients when choosing among several re-projection services available to them. It is expected that dedicated re-projection servers will be hosted on powerful workstations, hence considerably affecting the speed of service delivery. Similarly, some servers might provide faster service by applying approximations to the data, instead of applying more accurate yet more time consuming transformations. Table 3.2: Re-projection service capabilities parameters. An illustrative example of a DTD corresponding to the above listing of capabilities consists of <!ELEMENT Capability (ProjectionPair+, ImageFormat+, GeoInfo+, Resampling+, Speed?, Exception?> <!ELEMENT ProjectionPair EMPTY> <!ATTLIST ProjectionPair Source CDATA #REQUIRED Target CDATA #REQUIRED Reverse (yes,no) "yes"> <!ELEMENT ImageFormat (gtif, tif,gif, jpg,png, other)> <!ELEMENT GeoInfo (tab,world,geotiff,other)> <!ELEMENT Resampling (nearest,bilinear,cubic,other) "nearest"> <!ELEMENT Speed (slow,normal,fast) "slow"> <!ELEMENT Exception (default,XML) "default'> Examples for using both the ReProjImage and the GetCapabilities interfaces are included in the next section, which summarizes our experience with developing a working raster re-projection service. 3.4 Re-projection Service: a Prototype Implementation The most practical way to evaluate and refine a preliminary design is through a prototype imple- 58 mentation of that design. In this section, we describe our attempt to build a raster re-projection service prototype complying with the interfaces described thus far. The prototype is tested in two settings: functioning as a stand-alone service and interacting with map servers. The section highlights the unanticipated difficulties faced in this phase and how they allowed us to gain valuable insights into some practical constraints and requirements related to distributed GIS. We remind the reader that the objective of this effort does not involve implementing a 'perfect' re-projection service that is capable of handling a wide range of transformations, formats and interpolation techniques. Nor is the objective to deliver re-projection through newly devised algorithms. For this reason, our implementation is almost exclusively based on using and/or adapting existing re-projection tools and components. The goal for the initial implementation is for the prototype to support a meaningful subset of the input variations, which can later be easily expanded to create the 'perfect' service. However, as we show later, our decision to use available tools and components introduced its own set of problems and constraints, transforming our initially anticipated simple implementation step into a complex task. Nonetheless, the unanticipated obstacles encountered are indicative of the limitations of current GIS technologies, and hence shed light on areas that require improvement. 3.4.1 Implementation Options There are many tools and components available in the GIS field for performing raster re-projecctoin. Understandably, most of these tools have interfaces that are optimized for their internal operations and proprietary structures. Our decision to use ArcInfo 6.0 with RPC as the back-end server despite its many problems (such as licensing restrictions and slow raster projection speed) was a result of its availability for testing at the time of the prototyping, its support for a wide range of image formats, the ease of running it as a server, and our familiarity with its operations. Other choices available to us included: - Using portions of proprietary GIS systems, like the ArcInfo ODE (Open Development Environment) interface: The ODE environment is designed to allow developers to pick and customize subsets of Arcinfo functionality, which can then be treated as independent components. Indeed, ODE allowed us to access and use the re-projection functionalities of ArcInfo from outside the ArcInfo environment. The only problem we encountered during that process was the enormous amount of memory needed (30 MB on average on a PC or workstation) for each re-projection call, even for relatively small images. This implied that the prototype will lack in scalability whenever memory available is insufficient for handling simultaneous re-projection requests. In addition, being based on ArcInfo, whose raster re-projection process is often slow, the provision of the service was also slow. Table 3.3 provides some sample re-projection times for gray scale images. Finally, the ODE option was also restricting in terms of licensing terms and fees, making the solution rather expensive in a distributed setting where the service is continually used by multiple cli- 59 ents. Image Size 333 Mhz, 128 MB memory UltraSparc IIi Dual 550 Mhz Pentium Ills 512 MB memory 410 Dell workstation 2 KB 2 sec. 2 sec. 27 KB 5 sec. 3 sec. 72 KB 30 sec. 18 sec. 300 KB 32 sec. 19 sec. 631 KB I min. 25 sec. 45 sec. 10 MB > 1 hour 20 min. Table 3.3: Sample re-projection times using ArcInfo. Using portions of open source GIS code, like GRASS: Written in C and available for free, using portions of the GRASS code will relieve us from any licensing constraints that are associated with commercial proprietary systems. Unfortunately, the GRASS alternative was not without problems. On one hand, the re-projection module r.proj is known to be slow and only re-projects data in the GRASS raster format. On the other hand, we encountered some difficulties in the process of extracting the re-projection code from the surrounding GRASS code and environment, given the tight interconnection between the two. - Using available proprietary re-projection components, like the BlueMarble Geographic GeoTransform components: BlueMarble Geographic (www.bluemarble.com) offers a simple solution in the form of two geographic transformation DLLs that can be incorporated by developers in any application. Although attractive, this option suffers from licensing problems as we try to distribute these components on the Internet. - Writing our own re-projection code: There always remains the alternative of writing our own re-projection code from scratch. The attractiveness of this option lies in the possibility of discovering faster and more efficient algorithms for re-projection in the process. However, as this is not the focus of this study, this approach was not favored given that the time and the resources requirements for such an endeavor. The next section provides an overview of the final prototype developed. 3.4.2 Prototype A flowchart of the prototype developed is shown in Figure 3.5. A more complete listing of its capabilities and limitations is included in Appendix C.4. A sample call to this service is shown below. It includes a request to re-project a jpg image (shown in Figure 3.2) from the Mass State Plane to the Lat/Lon reference system. The output of the request is shown in Reproj Image. cgi?imageURL=http: //tull .mit .edu&test.jpg&gType=world&gInfo= http://tull.mit.edu/test.jgw&fromSRS=EPSG:26986&toSRS=EPSG:4269&resampling= bilinear&width500&height=5001 1. In practice, the imageURL would have to be encoded in this request. 60 Figure 3.2 Original jpg image in the Mass State Plane reference system. Figure 3.3 Image re-projected into lat/long using ArcInfo. Our implementation worked well in terms of achieving our prototyping goals. However, it displayed several performance limitations, many of which were a result of our decision to use 61 ArcInfo version 6. For example, some performance problems were linked to our use of RPC (Remote Procedure Call) methods to access ArcInfo's re-projection commands. With the RPC connection, ArcInfo can only handle one call at a time, which implied that any re-projection request received at the server had to wait in a queue until all requests ahead of it were processed. The resultant delay in processing simultaneous or overlapping requests was a clear disadvantage of the ArcInfo RPC solution. Requests were queued in the order they arrived, regardless of the size of the image or its estimated processing time. The other shortcoming of using ArcInfo was related to the software's inability to stream images, hence requiring every image to be fully downloaded to disk before initiating the re-projection sequence. Furthermore, ArcInfo's re-projection modules were relatively slow when handling large images (see Table 3.3), a serious disadvantage in a distributed setup where the combination of reliable performance and fast response is valuable. This is especially true in cases of service chaining where a delay in one service will affect the performance of the overall chain. On one hand, the poor performance of the re-projection modules can be generally attributed to the underlying algorithms used in ArcInfo. On the other hand, the fact that these modules performed re-projection on ArcInfo grids only, dictated that we introduce time-consuming image-to-grid and gridto-image conversions in order to be able to manipulate and re-project the images 2 However, after observing the prototype's input/output patterns for several examples, we concluded that we could significantly improve its speed if we were willing to accept some loss in the accuracy of the results. This can be achieved by substituting the exact yet slow re-projection step with another one that delivers a tolerably approximate result in a much shorter time. The approximations were obtained by applying linear transformations to the images. These transformations were derived from the pixel and geographic dimensions of the image in the source and target reference systems (see Appendix C.5). Figure 3.4 shows the result of our approximate algorithm applied to the image in Figure 3.2. Comparing it with Figure 3.3, we argue that the difference between the approximate image and the exact one is barely noticeable to the naked eye. In Appendix C.5, we show additional examples of exact-approximate pairs of re-projected images. All these examples collectively demonstrate that for most typical applications involving browsing small areas at a time, the trade-off between speed and accuracy will be more than acceptable. 1. Besides the ODE approach, RPC is the only way ArcInfo can be setup to run as a server. 2. After the release of ArcInfo 8 with improved projection code, we found that the grid-to-image and imageto-grid conversions remain the bottleneck in the re-projection process. ArcInfo 8 introduces a new command, projectGrid, which applies a best fit polynomial to re-project a grid, rather than projecting every individual pixel. 62 Figure 3.4 Image re-projected to lat/lon using approximation method. In the next section, we summarize the results of chaining the prototype re-projection service with the MITOrtho server (introduced in Section 1.2.2). 3.4.3 Chaining Prototype with MITOrtho Server In order to familiarize ourselves with the intricacies of service chaining, we experimented with chaining our re-projection service prototype with the MITOrtho server. As explained in Section 1.2.2, the MITOrtho server provides an interface for extracting customized geo-referenced images of the greater Boston area. Currently, the MITOrtho server is limited to serving images in the Mass State Plane reference system, the reference system which the images are stored in. Our goal was to build a new service, MultiProjOrtho, that can deliver the images in other reference systems selected by users. The MultiProjOrthoservice can be viewed as an intermediate service coordinating between the re-projection service and the ortho server, as shown in Figure 3.6. It allows users to download chosen images by providing the desired SRS for the image, the geographical bounding box expressed in units of that SRS, the image's data format and its pixel dimensions. As shown in the Figure 3.6, both the re-projection step and the call to the MITOrtho server are completely transparent to the caller of the service. Our comments regarding this elementary chaining experiment are presented in the next section which synthesizes our observations and findings. 63 - Image URL Geo Info from SRS Reprojected Image +/- geo-referencing information ReprojRaster Service to SRS Resampling OutputZoom ~ proj.aml Read Input Parameters (*) Check for Errors (1) r - Convert EPSG codes into ArcInfo terminology 4- Download image and save it to disk (2) Construct a .prj file for the current projection (5) Connect to ArcInfo via RPC (coded in C) (3) Transform input image to grid (6) Run proj script in ArcInfo (coded in aml) Project grid Prepare headers and send reprojected image to caller (4) Transform the reprojected grid (or stack) to an image L-------------------- Clean up the intermediary files and images Figure 3.5 Prototype re-projection service: Internal flowchart. All parts were coded in per] except when indicated. 1. This stage includes checking for missing parameters and detecting inconsistent information. This is done to ensure a consistent behavior in case of errors. Consistent yet sometimes forgiving behavior becomes a necessity when considering that services will be continuously chained and interacting. 2. Download the geoinfo if required. Because of ArcInfo's inability to deal with streamed images, the image has to be fully on disk before it can be manipulated. 3. A new connection is created for each reprojection request. When multiple requests are received, ArcInfo queues the requests and processes them in the order they were received. 4. The default is to send only the image (jpg, tif or gif) to the user of the service. When the user requests the geo-referencing information of the reprojected image to be returned as well, the service can either return a geotiff or an xml document (containing both the image and the geoinfo), depending on the user's choice. 5. A .prj file contains all the projection parameters needed for ArcInfo to project the current image. 6. In case of multiband images, ArcInfo transforms the image into a stack of grids, one for each band. Bands are reprojected separately and then combined. * 64 Bounding box (gxmin,gymin,gxmax,gymax) SRS Multi Projection Service format Pixel width & height (pwidth, pheight) Preparation step: determine URL of image on orthoserver that corresponds to image requested by user. In order to do that, we have to determine the equivalent values for the bounding box and the pixel width and height of the image on the ortho server. maxX,maxY gxmax,gymax project 4 corners using Proj utility gxmin,gymin minX,minY If the units of measurement are the same for the requested SRS and the Ortho server SRS, then the zoom of the image (geographic distance per pixel) remains constant across the SRSs. zoomX = (gxmax - gxmin ) / pwidth => newWidth = (maxX - minX) / zoomX zoomY = (gymax - gymin ) / pheight => newHeight = (maxY - minY) / zoomY C imageURL (as constructed above) 04 C Format = geotiff Bounding box (minX,minY,maxX,maxY T ReprojRaster Service format = geotiff y L ~min~minYmax Pixel width &height (newWidth,newHeight) Mass Ortho Server _I \ Reprojected image _____j Extraction step: since we obtained an area larger than the one requested, we need to extract the area of interest from the image returned by the re-projection server. After reading the geographic width and height of the returned image (using its geotiff header) and knowing the actual geographic and pixel width and height of the requested image, we can easily find and cut the area of interest (using tiffcut) and return it to the user. return image to client FCr3 i Figure 3.6 Chaining re-projection service with MITOrtho server: Interaction flowchart. 65 3.5 Synthesis of Observations and Findings This section provides a synthesis of the lessons learned from our prototyping effort, vis-a-vis the challenges of realizing a flexible and scalable distributed geo-processing infrastructure, that is also easy to manage and use. Most difficulties encountered in this effort were the result of the limitations of legacy GIS systems, and the lack of standards for geo-referenced images and SRS representations. We classify our observations and findings into four categories: 1. 2. 3. 4. Standards and interoperability issues Inherent GIS design issues Distributed infrastructure and service chaining issues Distinctive characteristics of the GIS problem. When appropriate, the discussion of the above issues is linked to related issues previously identified in the first two chapters, especially those pertaining to the range of users and applications, the enabling available technologies and the related ongoing interoperability efforts. The synthesis of all these issues will lead, in Chapter 5, to the construction of an analysis framework within which we outline infrastructure requirements, identify key elements and players, and compare candidate architectures. 3.5.1 Standards and Interoperability Issues Our prototyping experience revealed the lack of standards for at least two operations: the representation of projection systems and the encoding of geo-referenced raster imagery. As a result of the lack of a standard representation of SRSs, the prototyping effort required the conversion from and to three different forms of representations, namely the EPSG codes, ArcInfo's internal string representations and the Proj utility' (from the cartographic projections library). In order to ensure minimum functionality for the prototype, it became our responsibility, as developers, to perform the back-and-forth SRS translations across these representations. We found it necessary to limit the set of supported projections in order to avoid transforming our main task of building an interoperable service to that of building a super translator of projection representations. The list of supported projections is shown in Table C. 1. It is interesting to note that, in the OpenGIS testbeds, the EPSG codes have been heavily used, almost to the point of becoming the defacto standard. Yet, GIS vendors such as ESRI, who have been active participants in these testbeds, have not yet implemented mechanisms for their software 1. http://www.remotesensing.org/proj/ 66 to interpret these codes. This observation reinforces the discussion of a standard's acceptance (Section 2.2.2) as a critical pre-requisite to its success. Without standards, interoperability becomes fragile and could easily break with new technological advancements. A distributed infrastructure build under these conditions would be hard to scale. This issue emphasizes the importance of consensus-based organizations such as OpenGIS, because of their role in establishing standards. Our prototyping experiment also highlighted the lack of a standard for representing geo-referenced raster imagery. At the time of the implementation, it seemed that the GeoTIFF format is the defacto standard in that area, as most GIS software can read and produce images in variations of that format. However, going forward, we believe that an XML-based format similar to the currently-under-revision GML (Geographic Markup Language) would better fit the extensibility requirements of the distributed environment. An XML-based format is also better aligned with the current approach of the industry as represented by OpenGIS. The examples' below illustrate how an XML-based format allows for a "clean" separation between the image and its geo-referencing information. The latter can be either represented inline in the XML document or as an Xlink pointing to a file containing that information. By allowing the geo-referencing information to be independent of the associated image format, the XML structure of the information offers more flexibility to users. The XML documents can be easily filtered out by users to extract only the information they are interested in for their specific purposes. <GeoReferencedRasterImage> <Content type= "image/gif" xlink:href= "http://my.server.net/image.gif"/ <GeoInfo> <SRSName> EPSG:26986 </SRSName> <width> 500 </width> <height> 500 </height> <xcenter> 231000 </xcenter> <ycenter> 970000 </ycenter> <gwidth> 230000 </gwidth> <gheight> 90000 </gheight> </GeoInf o> </GeoReferencedRasterImage> <GeoReferencedRasterImage> <Content type = "image/gif" xlink:href = "http://my.server.net/image.gif" <GeoInfo type = "world" xlink:href = "http://my.server.net/image.gfw" </GeoReferencedRasterImage> Inline geo-referencing information Pointers to external geo-referencing information Figure 3.7 Encoding geo-referenced imagery using XML. We should point out that, at the time of this writing, a group within the WMT2 (see Section 2.3.4) is working on a draft for coverage extensions for GML that cover the case of geo-referenced raster imagery. Our impression of the draft is that it tries to be so comprehensive that it risks 1. These XML-based examples are very preliminary. Practical usage would require additional elements and attributes within this basic structure. 67 becoming too complex even for simple cases like ours. It remains to be seen if, when and for what purposes the specification will be eventually adopted. Our preview of this under-construction standard reinforces our understanding of the challenges of the standards' development process (Section 2.2.2), especially those related to timing and scope. 3.5.2 Inherent GIS Design Issues In addition to the lack of standards, building a prototype was also complicated by several limitations of current GIS systems in terms of accommodating a distributed multi-user setup. Although we encountered the obstacles while using ArcInfo, our experience with other systems shows that the limitations can be generalized to most legacy GIS software. An example of such limitations is the inability of ArcInfo (and most of today's standard GIS packages) to provide the functionality of connecting to data (and services) over the Internet through a TCP/IP connection. As an integral part to the operation of the prototype, this functionality had to be separately coded in Perl (see Figure 3.5). Additionally, the lack of modularity in the GIS systemi made it difficult to isolate and independently access ArcInfo's re-projection modules (even using ODE, as discussed in Section 3.4). This limitation later jeopardized the prototype's scalability and robustness when dealing with concurrent re-projection requests. Given the traditional uses of GIS as stand-alone systems, these limitations are understandable. The new functionalities required by the distributed setup are stretching the capabilities of the traditional systems beyond what they were originally designed to do. Indeed, the single user presumption explains ArcInfo's non-reentrant code and personal workspace characteristics, as well as the underlying assumption that all data are available and accessible locally from within these workspaces. Consequently, in order for us to use such a system in a distributed, multi-user setup, users' requests had to be queued in the order they arrived in as opposed to being processed in parallel. This setting implies a longer average waiting time for the users. Likewise, images had to be fully downloaded to disk before starting the re-projection operations as opposed to being streamed in and incrementally processed. Given these limitations, there is a real necessity for a fundamental change in the design of GIS software for it to be practical in a distributed environment. Indeed, we are beginning to see this trend emerging. For instance, ESRI is releasing a redesigned object-oriented version of ArcInfo, 1. Over the last couple of years, ArcInfo has been completely redesigned and redeveloped to leverage object-oriented and component-based technologies. However, the fact that the project took more than a year longer than expected by its team suggests that the original system (the one used for our work) was very interleaved and non-object oriented. 68 and new players, such as BlueMarble Geographic, are filling the need for simple specialized GIS components. Furthermore, the practicality, ease-of-use and sustainability of the distributed infrastructure are imposing minimum performance and service delivery requirements. These requirements can be best addressed by services designed to support multi-user and multi-threaded requests, streaming data, and cashing of queries' results, among others. The next section takes a practical look at these issues. 3.5.3 Distributed Infrastructure and Chaining Issues For a successful wide-based deployment of a distributed geo-processing infrastructure, the infrastructure needs to meet both technological and ease-of-adoption (by users) criteria. Thus far, our focus has been on identifying the key technological criteria of scalability, extensibility and interoperability. As for the issue of ease-of-adoption, its importance became clear to us through our exposure (albeit limited) to service chaining in Section 3.4.3. We conclude that there are three key prerequisites to a wide-base user adoption of a distributed geo-processing infrastructure: 1. Performance and reliability of services 2. Ease of access to services and their capabilities 3. Simplicity of service chaining. The three prerequisites are discussed below. 1. Performance and reliability of services To users of the distributed infrastructure, the performance and reliability of individual and chained services are critical. In this context, performance denotes the capacity of services to produce desired results with minimum expenditure of time and resources. A common measure of performance for distributed systems consists of the response time of a service and its throughput [87]. As for service reliability, it refers to the availability and trustworthiness of services as well as the up-to-dateness of any data they serve. In terms of performance, users will expect the distributed solution to perform at least as well as their current systems. Even though the distributed setup provides users with the flexibility and the ability to assemble a solution from a breadth of services, there will be little incentive to effectively use it if it comes at the expense of an inferior overall performance. The issue of reliable performance deserves special attention in the case of mobile users, whose mobility can quickly make location-based requests obsolete or inconsistent if they are not processed and delivered fast enough. Moreover, in Section 3.4.3, we observed that the overall performance of a chain of independently-provided services is only as good as its "weakest link". The performance of individual ser- 69 vices can be improved by optimizing the underlying algorithms or by hosting services (and/or replicating them) on specialized dedicated machines. For example, the performance of a computationally-intensive, memory-demanding service, such as the re-projection service, can be considerably enhanced if hosted on a multiprocessor machine, with a lot of RAM (for handling a larger number of larger images) and fast input/output pipes (for downloading and returning large images). The re-projection service case study serves as a good example of demonstrating how performance can be intelligently enhanced by applying an algorithm that delivers an approximation of the re-projection in a fraction of the time required to perform the exact operation. This however might come at the expense of reduced data accuracy and quality, which may or may not be an acceptable compromise, depending on the data's application context. As each context encompasses different requirements for functionality and service level, there will be other trade-offs, such as the trade-off between the breadth of functionality provided by a service and its degree of specialization/depth (see Section 2.2.2), which is also addressed in the next section. In terms of reliability, data authenticity deserves special attention. It is easy to see why users' need for data authenticity favors centralized or federated architectures where services are registered, certified and rated by designated authorities. The need for data authenticity also advocates strongly-branded established organizations and trustworthy service providers. Chapter 4 will explore these issues in greater detail. 2. Ease of access to services and their capabilities Another practical consideration of using the distributed infrastructure is the ease with which users are capable of accessing and locating the data and services that they need. The ease of access to services is proportional to the degree of interoperability and simplicity of the services' interfaces. Indeed the interoperability of the services guarantees, by construction, that they are easily interchangeable by users in applications. Section 2.2 covered some of the challenges embedded in achieving this objective. As for the simplicity of the services' interfaces, it is a particularly desirable feature for the expanding non-professional GIS user base. However, as described in Section 2.2.2, simplicity involves its own trade-offs. If an interface is simple enough to be used by a wide range of users for a variety of applications, it might lack the amount of depth needed for more advanced applications. However, if it is drafted to accommodate this depth of functionality, it risks becoming more difficult to be used for the simpler applications (recall our discussion of the coverage extension to GML in Section 3.5.1). The other factor influencing the ease of access to services is the ease with which users can locate these services and learn about their capabilities. This is where an interface such as the get- 70 Capabilities interface (described in Appendix B.2) contributes to the big picture, as a standard method for querying services about their underlying functionality and data. Within the context of our earlier discussion about performance issues, this interface could also be used by services to advertise their degree of accuracy, their speed, their current load, and other performance-related metrics. As with any distributed environment, it will become increasingly difficult to locate and keep track of services as the number of available services grows. In Chapter 4, we address how this problem can be alleviated by introducing facilitator services (such as catalogs, directories, search engines, etc.) which can be queried and browsed by clients to locate what they need. 3. Simplicity of service chaining As discussed in Section 1.2.3, service chaining is the main source of complexity to the users of a distributed geo-processing infrastructure. In order to make the infrastructure usable to as wide an audience as possible, it should be possible to hide the complexity of chaining. In this chapter, we saw how the design of the ReProjImage interface facilitated its chaining with map services, such as the MITOrtho server, by providing the image parameter as a URL. This URL can be used to request a customized map from the map server. Nevertheless, as the number of services grows in a chain, it becomes more difficult to keep track of various service interfaces, capabilities, locations and intermediate results among other factors. In this case, intermediary mediating services can be introduced in order to coordinate between the user's preferences and the capabilities of available services, and to transparently construct the chain of services that matches the client's need. In this context, the mediating services handle the dialogs amongst involved services, the exchange and transformation of data, as well as keep track of metadata and accounting. In doing so, the mediating services achieve the objective of hiding some of the details of service chaining from the user. The next chapter covers these mediating services, their roles and their variations in detail. 3.5.4 Distinctive Characteristics of the GIS Problem Our exposure to service design and chaining in this chapter, together with the background research covered in Chapter 2, allowed us to ascertain some distinctive characteristics of the GIS problem, which differentiate it from other distributed infrastructures. Those characteristics are listed in Table 3.4, and are discussed in detail in this section. 71 Characteristics Extracted from GIS Literature Characteristics Extracted from Prototyping Experience Underlying complex data structures Legacy starting point Distributed, multi-disciplinary nature of data Coordinate transformations Diversity of spatial information Metadata Interdependence of geo-entities Interactive use of data Complexity of operations Interest in archived data Semantics Size of data (length of transactions) Table 3.4: Distinctive characteristics of the GIS problem. The GIS literature attributes the uniqueness of the GIS problem to the distributed and multidisciplinary nature of geographic information [46], as well as the complexity of the multi-dimensional data structures used to represent that information [45]. The complexity of the operations performed on spatial data is further exacerbated by the interdependence of geo-entities and the propensity of proximate locations to influence each other and possess similar attributes. For a wide range of analysis, a single layer is often not sufficient, and its usefulness is greatly enhanced when linked or merged with other sets. Even when these sets are interoperable, semantic heterogeneity, as the literature stresses, is a major limitation to the re-use and sharing of this data. The differences in semantics and terminologies make it difficult to recognize various relationships that may exist between similar or related objects from different sources [52]. Our prototyping experience confirmed the general thinking in the literature, from the perspective of looking at the deployment and use of GIS data in the specific context of a distributed geoprocessing infrastructure. Furthermore, in many cases, this perspective highlighted certain practical issues that are likely to require special attention in a distributed setting. For instance, in Section 3.5.1, we experienced first-hand the additional effort required to overcome the limitations of GIS legacy systems, and to accommodate heterogeneous data models and representations. The prototyping experience also uncovered an additional level of complexity incurred as a result of projection systems transformations. On one hand, we learned that, even if the interoperability of data is possible, these data are only useful if they are in the coordinate system of the client requesting them. On the other hand, we happily noted that, with fairly simple transformations applied to the data (such as changing its format, scale, quality, coordinate system, etc.), the scope of applications of that data can be significantly broadened. For these reasons, the use of GIS data 72 lends itself to a more service-based model, whereby data and transformations are independently selected to create customized solutions. Another distinguishing characteristic of GIS data derives from the way this data is used and manipulated by users. For most applications, viewing, zooming and panning across a map is not enough. Users are typically also interested in the metadata behind the data, the constituent layers as well as individual features. The process of working with maps is hence quite interactive. Keeping track of the interactivity, the states and the intermediate results are both necessary and challenging. Sometimes, the interaction is even more involved, such as in the case of an address matching service returning more than one match to an input address (which is not uncommon). Furthermore, consider the example of a user using a handheld device to access local maps. The user is unlikely to be satisfied with the first map that he receives. He will also likely request additional layers for the same area, zoom in and out and pick certain features to obtain more information about. In this setting, the user is repeatedly retrieving tiny subsets of archived data sets, and may not be concerned about the absolute accuracy of the data. So clever compromises between performance and accuracy are warranted. Also, subsequent requests are likely to be based on data already downloaded. Such a localized use of the data calls for the possibility of categorizing according to its geographic location and the ability to find information based on its location (such as proposed in the .geo proposal summarized in Section 2.3.6). Additionally, GIS data can be returned to users at one of three levels: as picture, as a collection of features or as a coverage. The way of interpreting and combining such data and the resultant complexity changes depending on the level used in an application. Finally, compared with other mobile services, such as getting stock quotes, GIS-based mobile services are likely to require more bandwidth. Accordingly, clever data compression techniques are called for. The conclusion is that with GIS data, there is "no size that fits all". That is why flexibility is highly needed to serve different types of users, for different purposes, with varying degrees of interactivity. To corroborate this point, the WMT now has two interfaces: The first is for getMap, which simply returns a map (picture) while the second is for getCoverage, which returns the raw data so that users can manipulate the data without going to the server for each transaction. The key is to give users enough options for them to mix and match in order to fit their needs. In summary, this chapter described our experience with designing and developing a prototype image re-projection service. This experience reinforced and complemented some of the basic issues, requirements and challenges of building and using a distributed geo-processing infrastructure. In the next chapter, we use the issues identified thus far to create a framework within which 73 candidate architectural configurations supporting the distributed infrastructure may be analyzed and contrasted. 74 Chapter 4 Architectures: Components, Chaining and Issues 4.1 Overview In this chapter, we use the set of issues identified in Chapters 2 and 3 to analyze alternative architectures for facilitating seamless integration of distributed interoperable services. For each architecture, we determine the constituent components, study the resultant chaining process and highlight the trade-offs involved. In addition, we identify the key players expected to influence the evolution of these architectures, and the niche markets likely to develop with respect to each of them. 4.2 Approach An architecture is defined as the partitioning of a system into major components or independent modules [104]. Hence, the architectures presented in this chapter are best identified and differentiated by the types of components they consist of, and the distribution of chaining management tasks among these components. The architectures are presented in an incremental fashion, each building upon and adding components to those discussed before it. From the client's perspective, each architecture we present offers a higher level of abstraction, by increasing the transparency of service chaining. From our discussion in previous chapters, we learned that the needs of different sets of users are likely to be met by different architectures, depending on the nature of their environment and 75 the specifics of their tasks. We also saw how the application setting influences the set of choices and trade-offs that users face. Accordingly, the architectures discussed in this chapter are not mutually exclusive. Instead, they can and will coexist within different applications and environments. For this reason, we feel it is important to identify the most typical uses for each of the architectures. To establish a basis for comparing the different architectures, we construct a simple yet nontrivial example, and examine it under the light of each architecture. 4.2.1 Example Scenario and Assumptions In this chapter, we use the example of a user providing an address to an application and requesting a geo-referenced image centered at that address1 . Since our focus is on the components of the architectures, few assumptions are made about the client. In this case, the client can be an Internet browser, an application running on a mobile device, or a part of a larger application. One aspect of this example's simplicity is that the GIS data types handled are limited to raster imagery. Therefore, the example avoids the additional complexities of heterogeneous semantics and topology representations. Nevertheless, despite its simplicity, the example is rich enough for studying architectural issues, and exploring the trade-offs in various service chaining approaches. The services used in the example belong to three categories: - An address matching service (e.g. the Etak service introduced in Section 1.2.2): According to the GeocoderService RFC draft submitted to OGC, an address matching service transforms a phrase or term that uniquely identifies a feature such as a place or an address into applicable geometry (usually either a coordinate x,y or a minimum bounding rectangle). For simplicity, we assume that the service used by our client provides (x,y) coordinates in any projection and coordinate system specified by the user 2 . Typical address matching services return more than just the (x,y) coordinates. Additional information such as a normalized address, matching precision and location census tract are often appended to the coordinates. However, for the sake of simplicity, we assume that the additional information can be filtered out such that the client receives only the coordinates in response to a request. In some cases, address matching services return several locations matching a given address. In such cases, user intervention may be necessary to determine the intended address. - A map service (e.g. the MITOrtho server): The map server returns a map corresponding to pre-specified geographic and pixel dimensions of an area. In our example, we use the map and capabilities interfaces described in Appendix B. - A re-projection service (e.g. the one developed in Chapter 3): This service is needed because the native projection of a data set may not be appropriate, depending on the application, the scale at which the data is requested, and the projection system of other data sets used by the application. 1. For the sake of simplicity, we assume that the size of the image in pixels is fixed. 2. The Etak service returns the coordinates only in latitude/longitude. 76 imageURL geolnfo . fromSRS toSRS Re-Projection Service resampling Image +/- geo-referencing information outputZoom layer bounding box width hiht height khImagery PW0 foma format P SRS N Map Service Image address Address Matching SRS (x,y) coordinates Service Figure 4.1 Illustration of services used in the example. Figure 4.1 provides an input/output illustration of services from the three categories above. We emphasize that the client in our example is not limited to a particular service from each category. For instance, the client can access different map servers covering different geographical areas. Similarly, depending on the application, the client might have the option of using either an approximate re-projection service as opposed to an exact one, or accessing a service that specializes in certain transformations. In terms of authentication, we assume in our example that the authentication of the client by the services can be handled using available authentication technologies, such as Kerberos, cookies or basic http authentication. Similarly, we assume billing for services and data can be managed using current e-commerce related approaches, such as Ecash (www.ecash.com), CyberCash (www.cybercash.com) or PayPal (www.paypal.com). 4.2.2 Focus of Analysis As mentioned in Section 1.2, there is a wide range of issues surrounding the design of a scalable and extensible geo-processing infrastructure. Accordingly, for the example to offer any analytical depth, we need to focus on a subset of these issues. This subset covers the identification of 77 core components of underlying architectures, as well as their role in service chaining vis-a-vis performance, metadata tracking and error reporting. Additionally, we use the service chaining discussion to address issues of complexity of dialogs and data structures, and also their implications on the client's thickness and intelligence capabilities. The focused analysis will serve as a basis for a broader discussion of the issues in the next chapter. For instance, the analysis will help us understand how the distributed geo-processing infrastructure can fit within an ASP world, and how GIS can be integrated with mainstream IT technologies. 4.3 Abstraction Level 1: Decentralized Architectures At the lowest level of abstraction, the architecture's only components are the geo-processing services. At this level, service chaining and management are handled exclusively by the client. The next section provides a basic overview of services, followed by an analysis of service chaining in the decentralized environment. 4.3.1 Geo-Processing Services as Basic Components A service provides access to a set of operations accessible through one or more standardized interfaces. In the process, the service may use other external services and operations. Services can be grouped into two categories: data services and processing services. Data services, such as the MITOrtho server, offer customized data to users. These services are tightly coupled with specific data sets, and therefore, their capabilities describe both the interfaces they support as well as the data they offer . Processing services, on the other hand, are not associated with specific datasets. Instead, they provide operations for processing or transforming data in a manner determined by user-specified parameters. Processing services can provide generic processing functions, such as projection/coordinate conversion, rasterization/vectorization, map overlay, feature detection and imagery classification. Processing services also encompass the set of image manipulation services, which include resizing images, changing colors, computing histograms, applying various filters [79] and performing geospatial statistical analysis [50]. These services can be specialized according to particular fields such as forestry, landuse, agriculture or transportation. 1. The metadata about the data can be specified according to ISO 19115 [58]. 78 Clients can use the getCapabilities interface to retrieve a machine parsable description of the operations (and data) supported by a service (see Appendix B.2). The capabilities listed in the appendix may be further extended to include real-time information about the service, such as its current load, and the estimated processing time for the next request. Such information may be critical to some clients (such as mobile ones), which exhibit time and performance restrictions. A service's capabilities can also be used to store information about the groups of authenticated users allowed to access that service. Furthermore it can be used to convey the cost to a user for using the service. The cost of use may vary, depending on the type of user (individual, institution, government entity), on the amount of data/processing requested, and on the usage frequency. In e-commerce settings, clients may probe services for their price offers for specific transactions before committing to the use of these services. Additionally, services may provide free demo or limited versions that allow users to sample their basic capabilities. 4.3.2 User-Coordinated Service Chaining Given the absence of coordinating entities in the decentralized setting, a client (such as the one in the example described in Section 4.2.1) is fully responsible for managing all the interactions with the services. In this decentralized setting, the client must have prior knowledge of all service locations, by maintaining a hardcoded list of the services it uses. The rules used to decide which services are used at any point in time are also hardcoded in the client. These rules are applied when the client needs a new set of services to use. However, in most cases, consecutive transactions often use the same services. In order to avoid repeated retrieval and parsing of the capabilities of frequently-used services, the client may save a local copy of this information. The frequency at which the information is refreshed depends on how much it is likely to change from one request to the next, or from one session to another. Figure 4.2 illustrates the extensive amount of work performed by the client in our example to obtain the requested image. For every request, the client constructs and sends queries to the services involved. It coordinates the sequence of requests as well as the transfer of information between services. The client must also handle the intermediate results. For the seemingly simple task of retrieving an image centered at an address, the management responsibilities of the client are considerable. These responsibilities will only multiply in the case of more elaborate chains involving more services. 79 Client Re-Projection Service Ortholmagery Service Address Matching Service 1. Client sends address to Address Matching service. 2. Address Matching service returns a list of possible matching geographic locations. Client picks one address from the list and then goes through a prepared list of imagery services to locate a service covering the selected geographic location. Client finds suitable Ortholmagery service, sends a getCapabilities request to double check that the service is up and indeed covers area of interest. In the process, client finds that the selected service cannot return the image in the client's projection system. 3. Client requests image from OrthoImagery service in a projection system supported by the service. 4. Ortholmagery service returns image in its native projection system. From a prepared list of Re-Projection services, the client picks a service that can transform the image into the desired system. 5. Client uses selected Re-Projection service to transform saved image. 6. Re-Projection service returns final image to client. Request Inputs Output AddressMatch address= "77 Mass Ave Cambridge MA 02139" SRS = EPSG:26986 x,y Map bbox=x-a,y-b,x+a,y+b image.jpg SRS=EPSG:26986 width,height, format, etc. ReProjImage imageULR=image.jpg fromSRS=EPSG:26986 (Mass State) toSRS=EPSG:4269 (lat/lon) reprojected image.jpg Table 4.1 : Simplified service requests with input/output parameters. Figure 4.2 User-coordinated service chaining in decentralized architectures. Local storage of intermediate results at the client is one shortcoming of service chaining as depicted in Figure 4.2. However, this storage is necessary because of the process the client follows for constructing the chain. This process involves calling the services successively, storing the returned output of each service, and subsequently forwarding it to the next service in the chain. One way to overcome the shortcoming of local storage is to directly embed, in the input to one service, a request to the following service in the chain. This approach, shown in Figure 4.3, leverages the closure characteristic of services mentioned in Section 3.3.2. The same approach is employed 80 in SQL, whereby a query can be embedded as a parameter in another query, hence avoiding the need to retrieve and temporarily save its results in a table before using them. Even though the services in Figure 4.3 appear to be calling each other, it is not necessary for them to know or understand each other's interfaces. The construction of the requests and subrequests is still managed entirely by the client. Clent Re-Projection Service Ortholmagery Service Address Matching Service Client constructs a request leveraging service closure to chain the services. The calls to the participating services are nested in one long call. An example of a pseudocode for the request is: Reproj .cgi?fromSRS=EPSG:26986&toSRS=EPSG:4269& imageURL=URLencode (ortho.mit.edu/wmtserver.cgi?request=map& bbox = function (www. add~atch. com/addressMatch. cgi ?address="ff77 Mass Ave Cambri dge MA"&SRS=EPSG :269 86 )) 1. Instead of downloading the image itself, the client provides the re-projection service with the UJRL of the image that it needs. The geographic coverage of the image in turn depends on the results of the address matching service. 2. As a first step, the re-projection service retrieves the image pointed to by the imageURL. 3. The bounding box of the image depends on the address matched by the Address Matching service. 4. The Address Matching service returns the coordinates to the imagery service. 5. The Orthoimiagery service now has all the parameters needed to construct the image. The image is returned to the calling service, namely the Re-Projection service. 6. The Re-Projection service now has all the inputs needed to perform the re-projection. It re-projects the image and returns it to the client. *. For the sake of simplicity, we assume that the case of multiple address matches is handled through a separate dialog between the client and the Address Matching service (not shown here). Figure 4.3 Using nested calls for service chaining. 4.3.3 Complexities of Nested Calls While nesting calls promises to simplify some of the coordination responsibilities of the client, it nevertheless introduces complexities in the areas of error and metadata propagation as well as the client's ability to control certain details. In Figure 4.2, the client communicates directly with the individual services. Consequently, when an exception is generated by a service, the client has 81 first-hand knowledge of that exception. As mentioned in Section 3.3.2, exceptions can be sent either as XML documents, or as files in a format that is expected by the user. In either case, the direct link to services enables the client to easily detect the exception and respond accordingly. With nested calls, informing the client of the exact nature of an exception is more complicated. Consider for example the case of the address matching service issuing an exception in response to an erroneous call from the client. To the orthoimagery service in Figure 4.3, this exception is viewed as an invalid input, triggering the orthoimagery service to signal an "invalid input" exception. A domino effect ensues, as the same problem occurs over the orthoimagery/reprojection link, forcing the re-projection service to signal its own "invalid input" exception (this time to the client). Although the client is eventually informed of the occurrence of an exception in the chain, the actual exception received by the client does not disclose information about the source or the cause of that exception. One way to overcome this limitation is to allow a service to automatically forward a received exception input, as is, to the next service in the chain; while appending to the forwarded exception any of the service's own error messages. In this context, representing exceptions in XML is particularly useful as it makes it easier for services to detect and add to incoming exceptions. An example is shown below. The client can easily parse the XML-formatted exception and access the innermost error message to determine the root cause of the problem. <WMTException version = "1.0.0"> <Service = "http://coast.mit.edu/reproj.cgi"> <wmtserver-001: Invalid input> <WMTException version = "1.0.0"> <Service = "http://ortho.mit.edu/wmtserver.cgi"> <wmtserver-001: Invalid input> <WMTException version = "1.0.0"> <Service = "http://www.addMatch.com/addmatch.cgi"> <addMatch-35001:Unsupported SRS> </WMTException> </WMTException> </WMTException> This approach of relaying exceptions to the client can be extended to handle metadata propagation. One example of metadata is information about billing from individual services. Metadata can be appended to normal data as it is passed between services. However, for services to process and exchange documents containing both the data and its metadata, a standard is required, as was discussed in Section 3.5.1. Finally, we consider the issue of an unexpected delay occurrence in one of the services in a chain. The serial nature of the chain implies that the delay propagates through the chain, and all the 82 way to the client. In the scenario where the client directly accesses each service, as shown in Figure 4.2, the client can abort the operation if a specified time-out period for a service expires. In that case, the client can opt for a substitute for the timed-out service. However, with nested calls, the client loses the direct connection to individual services, and must wait for the final overall result. In order to control the length of this waiting period, a new global time-out may be introduced. This time-out is controlled by the client and is communicated to every service in the chain. At any point, if a service takes longer than this global time-out, it will abort and return an appropriate exception to the preceding service in the chain. Despite its simplicity, this method of handling time-outs requires the addition of at least one parameter to the services' interfaces, namely the time-out duration. This duration ought to depend on the statistics of the processing time for a a service, and the type of the client. Mobile clients, for instance, are expected to need shorter time-outs given their mobility and the critical dependence of their requests on location. The efficiency of chains can be further improved through data compression. Compressing data decreases transmission time, although it may be at the expense of increasing the processing time at the service. Nevertheless there ought to be a trade-off point, which depends on the nature of the services, their operations and the bandwidth available between them. Given the potentially large size of raster data (depending on resolution and extent), data compression may be particularly beneficial in our example.If data compression is used for raster imagery, then there will be a need for an open standard (which does not yet exist). Candidates for such a standard include the Multi Resolution Seamless Image Database (MRSID) and the Enhanced Compressed Wavelet (ECW) algorithm [64]. 4.3.4 Issues and Implications The detailed analysis of the example illustrates how service chaining in a decentralized setting requires deep involvement of the client. In other words, a significant amount of complexity is imposed on the client in order to (1) locate the services and data needed for the task and (2) coordinate the dialogs among the services selected. This complexity directly contradicts the "ease of use requirement" for wide-base adoption of a distributed geo-processing infrastructure (as discussed in Section 3.5.3). In the remainder of this chapter, we focus on identifying alternative ways for minimizing this complexity. In terms of locating data in a decentralized setting, an information geo-indexing scheme (such as the .geo proposal presented in Section 2.3.6) can prove to be valuable. Such a scheme simplifies the process of dynamically searching for data according to its location. However, given the global 83 extent of such a scheme, issues of data quality assurance may quickly arise, and can easily push for controversial data certification procedures. Furthermore, such a scheme does not address the problem of locating geo-processing services, as the physical location of these services is likely to be independent of the functionality they provide. Section 4.4 introduces catalogs as a more general and scalable way to address this problem. The other source of added complexity to the client derives from its responsibility for coordinating dialogs among different servers. In this context, interoperability and simplicity of service interfaces are critical as they spare the client the burden of handling multiple interfaces for similar services. Similarly, the use of XMIL for standardizing data and metadata exchanges between services contributes to minimizing this aspect of the client complexity. In the next section, aggregate services are introduced as a method for hiding the complexity of service chaining from the client. 4.3.5 Aggregate Services Aggregate services bundle pre-defined chains of services and present them to the client as one. By handling all control and interaction among the services, they hide the complexity of service chaining from the client. Aggregate services can be thought of as extending the capabilities of one service by combining it with another one. For instance, an ortho imagery service can be extended to handle additional reference systems by combining it with a generic re-projection service. Figure 4.4 illustrates how the chain in our example can be hidden from the client in a black box. address --- SRS ylayer Address Matching SRSSrvicewidth Service ye r (xy)-height format Imagery Map Service Image imageURL 700" Sericegeolnfo . .--. . --fromSRSI ___-_ toSRS Re-Projection Image Service outputZoo Figure 4.4 Aggregate services. By bundling complementary services into one aggregate service, the aforementioned complexities of error reporting, metadata propagation as well as authentication and accounting are completely hidden from the client. For all intents and purposes, the client will not be able to distinguish 84 between an aggregate service and a basic one. In many cases, we expect the constituent services of an aggregate service to be supplied by the same provider. In such cases, better efficiencies can be achieved by using proprietary protocols for the communication among the constituent services. Despite their benefits, aggregate services have some drawbacks. By having a single access point to the chain, the client loses some of the flexibility and control over parameters of the individual services. For instance, in the example depicted in Figure 4.4, the client has no control over the re-projection step. In fact, the client is not even aware that the image is being re-projected. The invisibility of this step and the assumptions made by the aggregate service can be misleading to some clients. Consequently, clients should be able to differentiate between basic and aggregate services. Indeed, such information can be communicated to the user via the capabilities file, which may include a flag for aggregate services, and a link for more information about its constituent services. Unfortunately, aggregating services and presenting them as one service to the client has a negative impact on the size of the capabilities of the aggregate service. In our example, the aggregate service capabilities file needs to include the long list of all possible SRSs that can now be supported by the service. As the number of the constituent services grows, the combination of options available also grows, albeit at a much faster rate. This in turn increases the time it takes to transmit the capabilities information from the service to the client. In many cases, however, the increase in the transmission time is negligible, given that this information is text-based. A more serious drawback however is the additional processing needed by the client to parse the longer capabilities file. Fortunately, there is a more flexible and scalable alternative to aggregate services in a distributed environment where static binding of services and calls is often not efficient. This alternative, namely mediating (or smart) services is covered in Section 4.5. 4.4 Abstraction Level 2: Federated Architectures with Catalogs From the discussion above, it becomes clear that a certain level of control is needed to prevent the decentralized architecture from becoming chaotic. Federated architectures provide this minimal control by finding a balance between the flexibility of decentralized setups and the coordination advantages of centralized ones [26]. In a federated architecture, the individual services are still loosely coupled, and maintain their autonomy. There is still no central controlling authority in the system. Instead, catalogs are introduced to enable clients and services to find, query and browse services on the network [100]. 85 4.4.1 Catalogs for Service Discovery Catalogs have long been recognized as providing an efficient way for organizing and condensing knowledge of large collections of items [39]. They provide a set of common services to support local and global information discovery, metadata retrieval, information browsing, cataloguing and indexing. The primary function of catalogs is to help locate the address of a dataset or a service based on its metadata record of it. This in turn maintains the transparency of the location of the data and services. Catalogs can index services by several categories, such as location, service type, provider, cost, domain, certification or even performance statistics. In most cases, catalogs will keep local copies of the basic capabilities of the services they point to. These capabilities are updated either directly by the service provider, or by occasionally pulling services for their capabilites. In a distributed infrastructure, several catalogs may be available, and may even point to each other. Given the existence of many catalogs, they too need interoperable interfaces. Indeed, the OGC is working on the Catalog Services Abstract Specifications, which defines interoperable spatial data catalogs that can be used to discover spatial data holdings in different environments [79]. The specifications also include interfaces for defining, adding, removing and modifying entries in catalogs. The OGC work is also linked to the ISO TC/21Iwork in the area of metadata content. 4.4.2 Service Discovery and Chaining Figure 4.5 illustrates the interactions between the client, catalog and services in a federated setup. The introduction of catalogs alleviates the client from the burden of service discovery. However, the client is still responsible for making the decisions about which services to use after consulting with the catalog, and is also specifying the details of service chaining. In some cases, the catalog may return several services as matches to the client's query, in which case the client may need to query the services directly for their detailed capabilities, and make a decision accordingly. 86 Catalog Client Re-Projection Ortholmagery Address Matching Service Service Service 1. Client queries the catalog for an Address Matching service and an Ortholmagery service covering area of interest. If needed, the client also uses the catalog to find a suitable Re-Projection service. For each query, the catalog may return one or more addresses for services that can be used [39]. 2. Depending on their design, catalogs may occasionally update their listings by querying services for their capabilities. Services that are registered in a catalog can trigger this updating process whenever their capabilities are modified. 3. While the client does not need to store lists for the services it needs, it still has to manage and coordinate of service chaining. Figure 4.5 The role of catalogs in a federated setup. 4.4.3 Issues and Implications Figure 4.5 illustrates how catalogs mat be used by clients to look up services they need for their chains. Although introducing catalogs to the architecture simplifies the design of a client, it is still required to construct and manage the service chains. This implies added client responsibilities for communicating with catalogs and interpreting their results. Furthermore, if queries and their results are encoded in XML, then clients need to have the local intelligence for constructing and parsing sophisticated XML documents. For most applications, however, the communication with catalogs does not have to be frequent. Service addresses retrieved through earlier queries can be locally saved at the client, and subsequently used without referring back to a catalog. In terms of implications for GIS market dynamics, we believe catalogs will be made available by both the public and the private sectors. In the public domain, many governmental entities already maintain a variety of detailed datasets covering their jurisdiction, and are increasingly interested in facilitating public access to this data. A federated setup for keeping track of the data available nationwide goes a long way towards making this data more accessible to a wider range of users and applications. In this federated setup, each state may designate an agency to coordinate 87 the distributed efforts of geographic data collection, storage and dissemination within that statel. In this context, the designated state agency can host catalogs which index the agency's own data, and point to other catalogs/data provided by individual counties or towns within that state. In the private sector, catalogs are likely to be supplied by satellite imagery providers. These providers can use interoperable catalogs to maintain indices for the images they collect, as well as pointers to generic geo-processing services commonly needed to manipulate these images. More geo-processing services are likely to be provided by current big GIS software providers such as ESRI and Intergraph. Such players are likely to be interested in catalogs as a way to advertise their own services and provide a one-stop shop for geo-processing services. Given their known brand names and their dominance in the GIS market, it will not be surprising if these big players charge a higher premium for their services. It remains to be seen, how soon such dynamics will materialize as they require the big players to shift their business models and modify their system architectures, which they have been slow to do. Catalogs also offer interesting opportunities for new players in the GIS and IT markets to provide more sophisticated search tools. As the number of services and catalogs available in an environment grows, there will be an increasing need for search-engine-like tools that can consolidate information retrieved from various catalogs. Such tools may also provide interfaces through which users can pick services they need. Furthermore, these tools can dispatch the users' requests to a variety of available catalogs, and then allow users to sort the results according to different criteria, e.g., price, quality or provider. As such, these tools are similar to popular online price comparison sites (e.g., metaprices.com or mysimon.com) which allow users to pick a category of items to compare (e.g., cds, books, electronics) and then return a list of items along with their prices, availability, special offers and reviews from various online shopping websites (e.g. amazon, and barnes and nobles). However, even with the aid of such sophisticated tools, the complexity of service chaining from a client's perspective is still high. The next section complements our discussion of federated architectures by introducing mediating services as efficient agents that can further raise the client's level of abstraction when constructing a service chain. 1. For example, MassGIS is the logical choice for such an organization for the state of Massachusetts, as it is already responsible for the collection, storage and dissemination of geographic data in Massachusetts. 88 4.5 Abstraction Level 3: Federated Architectures with Mediating Services In order to relieve clients from explicitly manipulating multiple service connections and handling intermediate results, we introduce mediating services. Mediating (or smart) services act as gateways to other services by coordinating between multiple services without necessarily storing any data of their own. Mediating services combine the simplicity of aggregate services with the flexibility and control inherent in decentralized architectures. 4.5.1 Mediating (Smart) Services The concept of mediating or smart services is borrowed from the database arena. In a distributed database setting, mediating elements are often introduced to dynamically convert multi-database queries into smaller sub-queries that can then be dispatched to the various databases.The results of the sub-queries are then integrated by the mediating element and returned to the client. In the database literature, these mediating elements are also referred to as facilitators, brokers, and dispatchers [36]. Correspondingly, in a distributed geo-processing infrastructure, mediating or smart services dynamically construct and manage chains of services. Based on their client's requirements, mediating services determine appropriate data sources and services, retrieve and process the data, and then assemble the final response. In the process, a mediating service may consult with catalogs, search engines or meta-search tools known to it. It can also keep its own indexed lists of useful services, which are more likely to be biased towards certain providers and/or domains. For efficiency purposes, mediating services may also provide commonly used basic functions, such as format, coordinate or vector-to-raster conversions. Moreover, mediating services may use pre-specified client preferences to search for appropriate data and processing services that best meet their clients' requirements. Such preferences might include information about service time-outs, price ceilings, accuracy requirements, and maximum number of services chained. In some cases, the client may also wish to specify a preference for a particular service provider. The client may also impose a constraint that all services used in a particular session be supplied by the same provider, presumably to achieve certain efficiencies as well as monetary savings. 89 4.5.2 Mediating Services and Service Chaining Figure 4.6 illustrates how the complexities of service chaining can be alleviated by a mediating service. With much of the complexity hidden from the client, the desired thinness of the client is restored. As discussed earlier, client thinness is a critical element in promoting distributed geoprocessing infrastructures. Smart server Catalog Re-Projection Ortholmagery Address Matching Service Service Service 1. Client is hidden from the complexity of locating services and coordinating their chaining. Client either specifies its preferences to the mediating (or smart) service by having an accessible local preference file, or registers its preferences with the smart service a priori. 2. Smart server maintains current state. A sophisticated smart server can try to anticipate the next request of the client or cache some results that are not likely to change. Client can specify the frequency at which latest results are refreshed at the smart server. 3. & 4. Dialogs between smart servers and catalogs are the same as in earlier described architectures (Figure 4.2 and Figure 4.5) Figure 4.6 Service chaining with mediating (smart) services. 4.5.3 Issues and Implications Examining the scenario in Figure 4.6, we note that the mediating service need not necessarily construct a new chain of services for every client request. In most cases, once the mediating service identifies a suitable chain for a particular task, subsequent requests for that task will use that chain, unless otherwise specified in the client' preferences. Depending on the application, the client may specify that its chain be re-constructed only once per session or per request. This flexibility, however, comes with a caveat: if the chain is re-constructed for each session, then consecutive sessions are not guaranteed to be using the same sets of data or services. Whether this is acceptable to the client depends on the nature of the client and its application. 90 The client may also want to specify that a particular chain used in an earlier session be used for the current session. One way to implement this feature is to save the desired chain as a cookie on the client, and communicate to the mediating service with each request. Alternatively, the client can use the existing chain, and manage the service chaining itself. In this case, the mediating service is used as a tool (or an agent) by the client to determine the optimal chain of services needed for a particular client. It is then up to the client to execute that chain. However, this alternative may be more expensive to the client, since subscribing to one mediating service is likely to be less expensive than subscribing to a variety of individual data and transformation services. Furthermore, given that a mediating service will serve multiple clients, it will be in a better position to negotiate better deals with the individual services. Finally, it is important to point out that, at the heart of any mediating service lies a set of predefined rules which dictate what the optimal selection and chaining of services are for each task. These rules are used to match clients' preferences with appropriate services and data sources (accessible by the mediating service). As such, mediating services can be considered as specialized versions of existing process management and integration tools. The need for specialization is a consequence of the distinctive characteristics of GIS (see Section 3.5.4), especially the semantics associated with GIS data and the complexity of spatial queries. With the wide range of possible GIS applications and the different semantics needed in different fields, it is more likely that the internal mediating service rules will be tuned to specific application domains. Therefore, in terms of market dynamics, we foresee the emergence of a variety of mediating services, which range in their "smarts" as well as the nature of their specialization. The need for domain-tuned services will constitute excellent market entry opportunities for third party players with significant expertise in a certain domain, but with no capabilities to single-handedly offer and maintain all the data and transformations needed for that domain. In summary, mediating services promise to minimize the complexity of service chaining while providing clients with solutions that are specifically tuned to their preferences. A mediating service also provide a client with a single point of contact for accounting and authentication, as well as error and metadata reporting. Naturally, and as we have seen in other areas, standards will be needed in these areas to make the infrastructure scalable and the design of mediating services simpler. 91 4.6 Summary The architectures, their components and the key issues presented in this chapter are summarized in Table 4.2. In the final chapter, we consolidate the issues we uncovered thus far in this thesis. We formulate a framework/roadmap to help developers/users make the "best" choices for building/ using a distributed geo-processing infrastructure. We then use this framework to study the implications of our analysis on the set of standards and protocols that are needed to build and support a distributed geo-processing infrastructure. 92 Architecture Decentralized Potential Influential . Orgazations Components catalogs Federated with mediating services Key Issues - Services - OpenGIS for determining - Client, including error han- - Complexity of service chain- - Aggregate services service interfaces and categorization dling, accounting and authentication ing management for clients - Compression standards - Federated with Coordination Responsibilities of Service Chaining geo for registries - Services - - Aggregate services standards for catalogs - Catalogs - Governmental - Data exchange standards - Simplicity of interfaces - Difficulty of locating services - Client still responsible for the - Catalog interfaces chaining - Catalog data return and XML for - Catalogs are used to locate parsing catalog and metadata coordination (linked to NSDI) - Private companies for the provision of specialized services and query services - Catalogs can be sophisticated and act like search engines or metasearch engines - Service registration in catalogs - Standards for metadata - Opportunities for new and old players, public and private - Services - Same as above in addition to - Tuned smart servers - Error propagation - Aggregate service - Catalogs - Mediating (smart) servers private companies in various industries providing smart servers for niche markets ISO & OpenGIS for metadata entities - Metadata propagation - Specification of preferences - Degree of "intelligence" Table 4.2: Summary of architectures: components, service chaining and issues. Chapter 5 Synthesis and Conclusions 5.1 Summary of Research This thesis is motivated by the growing need for a new distributed GIS model, in which GIS functionalities are delivered as independently-provided yet interoperable services (Point 2 in Figure 5.1). Such a model is especially beneficial for scientific research and engineering modeling as well as state and federal government settings, where tightly coupled hierarchical systems are unlikely to provide the desired breadth and flexibility. A distributed geo-processing model allows users in these settings to freely combine services to create customized solutions with minimal programming, integration and maintenance efforts. Furthermore, a distributed geo-processing infrastructure can facilitate the integration of GIS with other information systems, and can also support the needs of thin mobile clients for location-based services. A distributed infrastructure will hence be a key enabler for GIS to extend beyond its traditional boundaries of mapping to embrace a broader community of users and a wider scope of services. This thesis focuses on identifying the major issues associated with building and using such a distributed infrastructure. Recognizing that there is no "one size fits all" solution in distributed GIS, one goal of this thesis is to consolidate these issues into a framework (or roadmap) that can be used by developers and users to navigate through the choices available to them. In following the framework, developers and users will still need to make implementation decisions that will be often based on trade-offs between conflicting requirements. The major issues were assembled by approaching the research from three complementing perspectives. The background research presented in Chapter 2 supported the viability of a distributed 94 model, particularly in light of the enabling IT technologies and growing e-commerce and ASP markets. This part of the research also reinforced the crucial role standards play in such a distributed environment. In Chapter 3, we presented the synthesis of our own experience with service design, implementation and chaining. This experience highlighted some practical aspects of building and using distributed services, and contributed to the identification of key characteristics of GIS that set a distributed geo-processing infrastructure apart from other distributed IT systems. Finally, in Chapter 4, we analyzed alternative architectures for realizing a distributed infrastructure and identified their basic components. We concluded that the next generation GIS model is likely to follow a federated architecture, where the basic components consist of services (basic and aggregate), catalogs and mediating services. The discussion in Chapter 4 also shed some light on expected dynamics in the new GIS marketplace, highlighting potential opportunities for new players to target a variety of niche markets by packaging some of the architectural components in ways that best serve the target markets. Degree of Interoperability 2 Degree of Componentization Degree of Distribution 1: Local non-interoperable GIS package 2: Fully interoperable distributed GIS components Figure 5.1 The shift to distributed interoperable GIS components. As these issues were identified, it quickly became evident that the complexity of navigating through available choices explodes a lot sooner than expected, even for simple scenarios. This complexity is intensified by the heavy dependence of the distributed geo-processing infrastructure on a variety of technological areas. As expected, there can be no unique solution that fits everyone's needs and constraints in this setting, especially when the details of the underlying technologies themselves have not been fully resolved. Consequently, the trade-offs upon which the distributed geo-processing choices are evaluated will keep evolving as different parts of the market gravitate towards technical solutions that fit their needs. 95 Therefore, a framework that allows developers and users to navigate through the complexities is valuable, because it provides them with an easy-to-follow roadmap that facilitates their search for a solution that best fits their needs with minimal sacrifices. Such a solution should fit within an interoperable world of GIS processing while being consistent with broader IT and web technologies. 5.2 Navigation Framework (Roadmap) In this section, we present the framework developed after researching the issues, choices and trade-offs involved in building and using a distributed geo-processing infrastructure. This framework is aimed at helping users and developers sort through the choices available to them. It is also intended to assist potential service providers in identifying the nature and combination of components they are best positioned to serve and specialize in. As we present the framework, the reader will notice that many of the individual issues facing users and developers do not significantly differ from issues facing the IT community in general. However, we argue that the way these issues are addressed in the GIS context may be different, given the distinctiveness of the GIS problem (see Section 3.5.4), especially in terms of the potential high bandwidth requirements, large sizes of data elements and the various abstraction levels at which the same data can be used. 5.2.1 Description of Framework We group the key issues that need to be addressed by users and service providers along five dimensions: 1. Application environment (see Table 5.1): The application environment of the user or the developer affects the ease with which new applications and clients can be introduced or used. In the GIS context, it also affects the ease with which geo-data can be integrated with other non-spatial data. 2. Data characteristics (see Table 5.2): The nature of the data as well as its typical uses within an application greatly affect the design of the application, both in terms of the extent to which data is distributed, and the formats used to represent, save and access that data. 3. Service characteristics (see Table 5.3): These characteristics refer to those of the services that best suit the application environment and data requirements. Service characteristics are further shaped by the nature and constraints of potential client(s) accessing the service. 4. Client characteristics (see Table 5.4): The range of potential clients as well as the constraints of individual clients determine what type of services and data are needed, and how to best combine them in order to meet these constraints. 5. Standards (see Table 5.5): Standards are the glue that allow components of an infrastructure to effectively work together. 96 In Tables 5.1-5.5, we identify, within each dimension, a basic set of questions to guide the users of the framework in articulating their needs and constraints. For each dimension, we also explore the available choices, their advantages (denoted by +), their drawbacks (denoted by -), and present our recommendations (denoted by *) based on the findings of this research. 97 Table 5.1 : Framework Dimension 1: Application environment. Dimension 11 Key Questions Current IT infrastructure * Are the systems vertically or horizontally integrated? - How rooted are legacy systems in the organization? Applications * To what degree does spatial data need to be integrated with existing databases? * How likely is it that applications will access different/distributed repositories of data? APPLICATION ENVIRONMENT 00 Implementation choices * How flexible is the organization in terms of technological choices? Is the organization committed to specific vendors? " Are our current requirements best addressed using vendor provided solutions? - Do interoperable services meeting our requirements exist in the market? " Do we have the expertise to integrate these services in-house? Can we outsource that integration? - Do we have the expertise to develop a complete solution inhouse? Do we have the capabilities to maintain the services and data? - What are our timeframe and budget constraints? I Choices, Trade-offs and Recommendations - Go with a vendor provided solution + Can provide a customized package for what is needed. - Restrictive approach, may be difficult/impossible to add new functional clients with minimal efforts. * In many cases, it is the only way big organizations can provide a coherent package of geo-processing services across departments. - Develop solution in-house + Allows customization and optimization of systems. - Requires extensive investment, IT talent, maintenance, expertise. * Allows implementation of simple tasks simply and today. Requires special attention to ensure that solution is easily extensible. It can be used while waiting for upcoming standards. - Use existing service provider(s) + Provides you with flexibility to mix and match services, frees you to focus on your area of expertise. The cost of the service is distributed among users. - May still require customization, and involve integration complexities. Potential problems with reliability, accountability, data integrity, and security. * Use only if services and their results can be easily integrated (requires standards). Specially recommended when services are tuned to specific application domains. Use if the back end is expensive. Table 5.2: Framework Dimension 2: Data characteristics. Dimension DATA CHARACTERISTICS 1i Key Questions Choices, Trade-offs and Recommendations Data needed for current and potential applications * What is the type of data needed? (archived, collections of features, maps) - How large is the size of the data typically needed for a task? - How frequently is the data updated? - How specialized is the data? how localized? - Keep a local copy of the data + Provides faster access especially for larger data, more control. - Requires maintenance and updates. * Keep a local copy if the data doesn't change frequently, when all the topology details are needed, or when typical requests are huge. Typical uses of the data - How frequent is the use of the data? - Do we typically use the latest versions or do we need older ones as well? - Are there any accuracy or performance constraints? - Is it likely to be shared by other groups/organizations for different purposes? * How much control do we need to have over the data? - Access on demand + Access on demand, only extract data needed. - Less control over the data (formats, styles, display), requires the right set of partners and standards. * Recommended when data changes frequently, and/or when a variety of clients applications need to access these data repositories. - Use proprietary format + Can be optimized for internal purposes. - Harder to integrate with other data types. * Use when the client is tied to the back-end, and/or data does not need to be integrated with other data. - Use GML + Interoperable, easier to integrate data, extensible. - Might be too complex for simple operations, cannot accommodate all data models, and can be lossy. * Use when geo-data needs to be integrated with other sets, and/or needs to be used for a variety of applications. Table 5.3: Framework Dimension 3: Service characteristics. Dimension Key Questions Resource requirements - How expensive is the service to setup and maintain? * How computationally intensive is the service? - How difficult is it to expand the number of users accessing the service? Use in applications - Is the service more likely to be stand-alone or does it need to be integrated with other services? - How many applications/clients need the service now and in the future? 0 SERVICE CHARACTERISTICS Type of service * Is it a commodity service, or is it a special service used in certain domains? * How difficult is it to be customized? - Should we use/implement a map, feature or coverage service? - Is it typically used with other services in certain combinations? Is it more efficient to offer an aggregate service instead? - Do we have expertise in a specific domain? Do we have access to some local data or services? Is it more efficient to offer a mediating service instead? Can we find partners for complementary services/data? " Do we have a lot of data? Is it worthwhile to construct a catalog for these services? [ Choices, Trade-offs and Recommendations - Map server + Easy to use, graphically formatted results can be easily used in thin clients such as web browsers. - Provides users with no control over symbology or display characteristics, transparency issues when multiple layers are fetched. * Use when using simple clients like web browsers, and/or when only pictures of maps are needed. - Feature server + XML-encoded results, smaller size, more control over display properties. - Requires clients to understand and handle feature characteristics. * Use when features are needed, when client can render the results locally, or when control is needed over how the information is filtered and displayed at the client. * Coverage server + Provides users with access to the raw data, maximum control. - Requires a thicker client, still lacks standards. * Use when the raw data needs to be manipulated locally by the client. - Basic (self-contained) service + Allows users to access and use repositories of information for a variety of applications, maximum flexibility. - May involve complex integration efforts. * Use if number of services is small. Especially recommended for data services. - Aggregate service + Easier to use for simple combinations. - Complicates scalability of capabilities, and has issues of metadata and transformations. * Use if it can expand the number of applications that can use your basic services (e.g., coordinate transformation). - Mediating service + Provides tuned services as well as access to specialized services. - Requires the right of set of partners to deliver such services. * Use if you have/need specialized expertise in a niche market. Table 5.4 : Framework Dimension 4: Client characteristics. Dimension Key Questions Range of clients - How many clients are likely to use the services/data? - How similar are these client characteristics? " Who is providing the clients? CLIENT CHARACTERISTICS Individual clients - How thick can the client be? - How smart can the client be? - What are the client's constraints in terms of performance, bandwidth, display, etc. Choices, Trade-offs and Recommendations Thin Thin clients require the back-end services to handle the fetching and processing of data. Thin clients can use mediating services to handle the complexities of locating, combining and rendering any data needed. If the bandwidth of the client is also limited (as is the case with wireless mobile devices), the amount of data sent to the client might need to be minimized. Thick Thick clients can locally handle all or some of the processing. Thick clients can afford to use coverage services to retrieve information and manipulate it locally. Such clients can also handle service chaining. 0 Table 5.5: Framework Dimension 5: Standards. Dimension Key Questions Timing - Are current standards sufficient? - Do we need new standards? Should we wait for them? STANDARDS Standardization level - Should the data format be standardized? - Should the data exchange format be standardized? - Should the metadata be standardized? - Should data interfaces be standardized? Comprehensiveness and extensibility - How comprehensive should the standard be? - How simple should it be? How extensible do we need it to be? Choices, Trade-offs and Recommendations Refer to Table 2.3, Table 2.4 and Table 2.5 in Chapter 2. 5.2.2 Applications of the Framework The navigation framework helps users and developers of a distributed geo-processing infrastructure to first articulate their needs and their constraints, and then evaluate their alternatives based on the trade-offs involved in them. Accordingly, the most appropriate solution to a given problem depends on how that problem is positioned across the five dimensions of the framework. Figure 5.2 illustrates how various applications can be positioned across the two dimensions of client and data characteristics. In this figure, the client is characterized as either thin or thick. The thicker the client, the more power it has to handle and process raw geo-data. The other dimension in Figure 5.2 is the data, which is categorized as static or real-time, depending on the frequency with which it is updated. CLIENT Thin Thick * Mobile client (PalmPilot or cellular phone) requesting the nearest McDonalds' locations. * ArcView extension for " Mobile client tracking the * Real time traffic information location of dispersed cattle in a farm. * Mobile client subscribing to a service that signals the nearby presence of a movie star. " Mobile client accessing realtime traffic conditions. retrieving subsets of coverages on demand. e system used by transportation planners for local re-routing. Weather forecasting. Figure 5.2 Sample applications positioned with respect to client and data dimensions. The examples shown in Figure 5.2 illustrate how a distributed setup can serve a range of clients and data needs. In many cases, especially for thick clients, a combination of local and distributed data is needed for typical analysis. Indeed, the value of a distributed setup to thick clients stems from its capability to offer these clients the means to enhance their analysis by accessing distributed data on demand, and juxtaposing it on top of locally available data layers. In Figure 5.2, ArcView is used as an example of a thick client that often requires distributed static data coverages (such as elevation data) to be incorporated into local projects. Other thick clients, such as real-time traffic and weather forecasting information systems, depend more on juxtaposing real-time infor- 102 mation (such as traffic conditions and cloud coverage) on top of static data layers (such as the road network and the landuse distribution). Figure 5.2 also provides some interesting examples of location-based services that can be used by thin mobile clients. These range from allowing users to find the locations of nearby restaurants, to informing them about real-time information traffic conditions or a nearby sighting of a certain movie star. Indeed, location-based services, especially those related to recreation and transportation, are expected to be among the first and most visible applications benefitting from a distributed geo-processing infrastructure. 5.3 Implications on Required Standards and Protocols for Future Research Armed with a strong understanding of the issues covered in the navigation framework, we proceed to identify the implications of these issues on the design of standards and protocols that can support scalable, extensible and easy-to-use distributed geo-processing infrastructures. We focus on the standards and protocols within the context of the federated architectural model (described in Section 4.5), since this model already encompasses the other architectures discussed in Chapter 4, and is the most promising in terms of hiding service chaining complexities from the client. These standards and protocols will ultimately affect the design and complexity of mediating services, as well as shape the nature and the extent of their involvement in dialog exchanges with other components of the architecture. Our next task is to outline a fundamental set of standards and protocols that are needed for an efficient dialog structure among the various components of the architecture. This fundamental set of standards and protocols can then be mapped into a practical pathway that the GIS community may follow to ensure successful, scalable and extensible implementations of distributed geo-processing infrastructures. Such a pathway will serve to highlight the efforts that deserve the most attention in the near term, while promising the largest pay-off to the community in the long run. As stated at the beginning of this thesis, outlining a sustainable pathway is challenging because of the pathway's dependence on continually evolving IT and web technologies. This is where our analysis of the key issues uncovered in this thesis will prove the most valuable, as it will alleviate the challenge of coping with the inevitable uncertainties of IT evolution. According to our earlier analysis, data exchange and message passing standards are the cornerstones of any fundamental set of standards and protocols supporting a sustainable distributed infrastructure. In this section, we discuss some of the choices available for standardizing these 103 cornerstones in the case of GIS. We also briefly discuss the equally important catalog and metadata standardization issues. Throughout the discussion, our main concern is maintaining a reasonable overall level of simplicity. As inferred from the Internet discussion in Section 2.4, simplicity of message passing protocols and exchange data structures can go a long way if they leave enough room for extensibility. Furthermore, given the overall "no one size fits all" conclusion of previous chapters, we expect that there will be different combinations of standards that will work for different users and applications, as covered in the navigation framework of Section 5.2. 5.3.1 Data Format and Exchange Standards In keeping with the goal of fitting with today's evolving web technologies, an XMIL-based standard for exchanging geo-data in a distributed environment is very promising. With XML currently at the heart of web technologies, the ongoing work on GML (see Section 3.5.1) is indeed timely. According to Section 2.2.2, the "success" of GML depends on four factors namely, the process followed to design it, its timing with respect to other efforts in the field, the standardization level it addresses, and the scope of functionalities it offers. If "successful", GML can have a great impact on sharing and linking distributed geographic datasets, as well as integrating them with other non-spatial data. It also allows GIS applications to leverage a growing number of XML tools and technologies for data visualization (e.g., SVG and VML), data transformations (e.g., XSLT), schema expression (e.g., XML schema and RDF), and data querying (e.g., XQL). Following our exposure to GML in the prototyping experiment (see Chapter 3), we expect that the extent to which GML will be successful will largely depend on the controversial issue of how simple it ought to be set initially. Indeed, in the OGC circle, there is a tension between the view of GML as a format that can efficiently incorporate the richness of most current data models, and the view of GML as merely providing a means for accessing large complex databases. On one hand, if GML is to have built in mechanisms for handling the various data models, then the interface risks becoming too complex for simple purposes, hence limiting its utility. On the other hand, if GMIL is set as a least common denominator for these models, it risks missing various aspects of these models, hence undermining its usefulness in certain cases. In light of our analysis in this thesis, we anticipate that a simple, not necessarily all-inclusive GML will prove more valuable to a wider audience of users. Indeed, in most typical cases, users will be willing to forego some specific data model details for the benefit of effortlessly being able to integrate their data with other datasets. In this case, GML is considered more of an exchange 104 standard than a data storage standard for GIS data. Consequently, we expect users to continue to store their data in its current local form, and only transform it to XML as needed. Despite its advantages, an XML-based standard still carries some drawbacks. One issue with XML encoding is that the auxiliary XML tags describing and surrounding the actual data are often more voluminous than the data itself. A case in point is that the GML encoding of even simple collections of geometries, as illustrated in the example below, may still require an entire page of XML tags to fully describe them. <GeometryCollection srsName="EPSG:4326"> <geometryMember> <Point> <coordinates>50.0,50.0</coordinates> </Point> </geometryMember> <geometryMember> <LineString> <coordinates> 0.0,0.0 0.0,50.0 100.0,50.0 100.0,100.0 </coordinates> </LineString> </geometryMember> <geometryMember> <Polygon> <outerBoundaryIs> <LinearRing> <coordinates> 0.0,0.0 100.0,0.0 50.0,100.0 0.0,0.0 </coordinates> </LinearRing> </outerBoundaryIs> </Polygon> </geometryMember> </GeometryCollection> In such simple cases, XML encoding can be inefficient and cumbersome. However, it can be argued that this is a fair price to pay for an interoperable solution, especially since XMIL data can be easily compressed, and that parsing its content can be efficiently performed by machine parsers that are totally transparent to users. In order for GML to gain wide acceptance as a geo-data exchange standard, we argue that it needs to offer ways for accommodating the exchange of raw binary data. This can be done by packaging binary data, as is, in a GML document that only uses XML to convey metadata information about the characteristics of that data. Such an approach can be easily incorporated into the coverage extensions of GML (currently being drafted at OGC). It remains to be seen if and how this can be achieved without making GMIL more complex than it needs to be. 105 Given the many uncertainties surrounding GML, it is still too early to predict its future success. This is especially true considering today's broad world of other XML-based efforts, some of which, such as VML, SVG and VRML may be used in GIS. The outcomes of these efforts may prove sufficient to the general GIS user community, and hence bypass GML to becoming de-facto standards for geographic data exchange. 5.3.2 Service Chaining and Data Passing Throughout this thesis, service chaining issues have dominated our discussions as the basis for efficient use and combination of resources in a distributed environment. It therefore follows that, to a great extent, the shape of distributed geo-processing will depend on the technologies selected for expressing, exchanging and executing service chains in this distributed environment. When considering candidate technologies, the focus ought to remain on exploiting broader IT and web technologies that exhibit an adequate degree of extensibility, and that are effective in minimizing potentially voluminous data transfers among services and maintaining a minimal level of complexity. For the reasons outlined in the previous section, we find XML to be a suitable technology for expressing and exchanging service chains within distributed infrastructures. More specifically, we argue that XML's inherent capabilities for describing nested information, coupled with its intelligent linking features (e.g., XLink and Xpointer) offer the right set of tools for achieving these purposes. As an example, consider the interaction between the client and the mediating service in the federated architecture depicted in Figure 4.6. In this example, the mediating service can communicate to the client a potential service chaining solution, by using XML to encode the sequence of selected services as nested lists of (XLink) pointers to these services and their parameters. The value of XML's linking features is even greater considering their usefulness in significantly reducing the volume of data transfer among various components of the infrastructure, by allowing these components to exchange, when possible, pointers to data instead of that data itself (as in Figure 3.7). Figure 5.3 illustrates the data flow after applying this method of information exchange to the example described in Section 4.2.1. The client in Figure 5.3 is shown overseeing the execution of a chain it received as an XML document from a mediating service. In order for this scenario to work, the mediating service and the client have to agree on a DTD for the XML encoding of nested service chains. 106 In many cases, however, the client will be incapable of handling the execution of service chaining (see Section 4.5.3). In these cases, it will be the responsibility of the mediating service to execute the chains and return the appropriate result to the client. Figure 5.4 shows an example of such a client requesting an image that needs be assembled from three map servers. In this figure, the mediating service is shown initiating the execution of the chain and returning the composite image to the client. Instead of downloading the three constituent images and then sending them to the mosaicking service, the mediating service in this example delivers image pointers to the mosaicking service, which in turn, uses these pointers to retrieve the images before mosaicking them. Smart server Request Client XML Response Capabilities . -- Request -..... negotiations Data Re-Projection Service Request P Ortholmagery Service Data Request Address Matching Service Data - This figure only shows the interactions among the smart server, the client and the independent services. The catalog interactions are the same as in Figure 4.6. - Thick lines indicate data transfer while thin lines indicate service requests. - Data transfer is further minimized by leveraging service closure characteristics (see Section 4.3.2). Figure 5.3 Example 1: Using XML to minimize data transfer in federated architectures. Smart Server Client Map Serverl MosaickingM Service Map Server2 Map Server3 Figure 5.4 Example 2: Using XML to minimize data transfer in federated architectures. 107 The examples above illustrate the usefulness of exploiting XML technologies in the implementation of geo-processing service chaining. By using such broad technologies, fewer GIS-specific capabilities and skills will be required in the process of designing and implementing mediating services. This in turn will accelerate the rate at which they are developed and introduced to the market, hence fueling a faster growth of distributed geo-processing. However, as discussed in Section 4.5, certain aspects of mediating services will remain dependent on domain specific expertise as well as on the capabilities of the services accessible to them. 5.3.3 Message Passing and Dialog Structure In this section, we examine the message passing dialog that takes place between a mediating service and basic services, while adhering to our general goal of minimizing the volume of transferred data. In particular, we focus our attention on the phase of the dialog in which the mediating service queries basic services about their capabilities. As described in Section 2.3.4, the mediating service retrieves the capabilities of a service by issuing a getCapabilities request to that service. According to Appendix B.2, the service responds to the getCapabilities request by returning a machine parseable listing of all interfaces supported by that service. In the case of map services, getCapabilities additionally returns the layers of data offered along with a listing of the formats, styles and projections supported for each layer (see example in Appendix B.2). We argue that, although the simplicity of such a capabilities retrieval model is desirable, its scalability suffers when the size of retrieved capabilities is large. This observation is likely to be a major issue when dealing with aggregate map services (also known as cascading map servers within the OGC realm1 ). Since such services report the capabilities of other services as their own, it is not surprising that their capabilities consist of long lists of layers, with each layer accompanied by its respective long lists of projections and formats. For most layers, these lists of projections and formats will considerably overlap, resulting in large volumes of redundant information in large capabilities files. To give the reader a feel for how large these files can be, we introduce the example of an OGC-compliant cascading map server, provided by Cubewerx, Inc. (www.cubewerx.com). This service was used in the Web Mapping Testbeds to provide access to data layers from more than twenty map servers in a dozen SRSs. The size of the resulting capabilities file was on the order of 1 MB (i.e., approximately 400 pages of XML code). 1. According to the OGC Web Mapping Testbed glossary, a cascading map server "can report the capabilities of other map servers as its own and transform layers from those map servers into different projections and formats, even if those map servers cannot serve those projections and formats themselves". 108 It may be argued that, in comparison to the typical sizes of exchanged geographic data, the exchange of an additional 1 MB capabilities file is tolerable. However, these capabilities can grow exponentially as the combinations of possible layers, projections and formats quickly multiply. The size of the capabilities will also increase with the number of map servers that are cascaded. This is a concern in light of the expected reliance on cascading services sourcing from an even larger pool of services in an open distributed geo-processing infrastructure. Under these circumstances, it will not be surprising to see capabilities files that are much larger than the 1 MB file in our example. The large file sizes coupled with the expected high frequency at which capabilities files are exchanged in the infrastructure argues for a more scalable solution. A simple strategy for handling some of the scalability issues relies on applying a simple multistep process which begins by consolidating individual lists of projections into one single master projection list that is free of redundancy and repetitions. Similar master lists may be created for supported formats or styles. The second step consists of identifying subsets of these master lists that are the most likely to be used. In response to a first query by a mediating service, only these subsets are returned. If the mediating service is unsuccessful in satisfying a request for a layer based on the initial set of capabilities, it then issues another request to the cascading service, upon which the latter returns a new list of capabilities that are more specific to the layer in question (see example in Figure 5.5 below). A similar dialog may occur between the mediating service and the cascading service regarding other parameters of a layer such as its format or style. B A 1. Assume cascading service cascades four services A,B,C and D. The circles in the diagram represent the projections/formats that the cascading service can provide each layer at. D 2. In response to a first request by a mediating service, the cascading service only returns the projections/formats of the shaded area. 3. The mediating service uses the initial capabilities to request layer B at projection x, which results in an error. C A B X B A 4. The mediating service sends another request to the cascading service, upon which the cascading service returns the capabilities set of layer B, as shown. D C Figure 5.5 Reducing redundancy in capabilities retrieval. We note that a simple sequencing strategy, such as the one described above, indeed succeeds in avoiding redundancy and repetitions in cascading service capabilities, especially when the under- 109 lying services have similar characteristics. However, this strategy fails to tackle the more serious problem of potential exponential growth of the capabilities. We argue that, in order to ensure scalability, a more sophisticated dialog structure is needed. It would replace the current process of learning about service characteristics by requesting and parsing their capabilities files. In the new dialog structure, a mediating service may send successive inquires (with yes/no answers) to a service to determine its fitness for a particular purpose. The outcome of each inquiry is used by the mediating service to refine and focus the next inquiry. Ensuring the efficiency and scalability of such a sophisticated dialog structure would require an additional level of standardization that is expected to take time to evolve. 5.3.4 Other Implications The above discussion on data exchange and message passing issues mainly covers the interactions between the client, the mediating service(s), and the individual services in the federated architecture. In order to complete the discussion about the implications of our research on required standards and protocols, this section briefly recounts some of the standardization aspects related to the remaining component of the architecture, namely the catalog. With the catalog at the center of the federated architecture, these aspects include: - Standardizing the categorization of services in catalogs as to facilitate the search for services according to the category they belong to. At the simplest level, the categorization may involve grouping services into basic types such as map services, feature services, coverage services and geo-processing services. As more services become available, it will be necessary to subdivide these categories further according to the geographic coverage of data servers, and the specific functionality and application domain of geo-processing services. In Section 4.4.3, we also found it useful to categorize services according to their provider, performance, price offerings, etc. - Standardizing the catalog interfaces which allow users and other components to query and retrieve metadata as well as add and update service entries. The OGC has been actively involved in this area, and has been working with the FGDC and ISO/TC211 on Catalog Services Interfaces and Abstract Specifications (see http://www.opengis.org/techno/request.htm). For the same reasons outlined in Section 5.3.1, and especially given the advances in XML Schema and XQL, XML is expected to play an important role in standardizing the catalog interfaces. - Standardizing the return format and structure of query results. These results may consist of one or more addresses of services that may be sorted according to their relevance to the query criteria. The text-based nature of XML as well as its indirection capabilities make XML a good choice for representing such information in the form of a list of pointers to services. The above discussion tackles catalog standardization issues from a purely technical interoperability perspective (see Section 2.2.1). However, it has also been established that effective sharing and re-use of data among organizations also requires a minimal level of semantic interoperability 110 [52]. It is for this reason that the specialization of mediating services as well as catalogs is needed in the distributed infrastructure. Addressing semantic interoperability also calls for organizations in various industries to team up and establish standard vocabularies (in the form of XML DTDs and Schemas) in order to facilitate data sharing and re-use within and outside their respective industries. It remains to be seen how and when such issues will be resolved, and how they will affect the evolution of distributed GIS. 5.4 The Future of GIS Technologies and Markets In summary, this research is motivated by the promising prospects of distributed GIS computing. With the growing amount of data available and the rising interest in using that data in a variety of settings, accompanied by the advances in web, e-commerce and mobile technologies, the market is clearly gravitating towards a more distributed model of GIS. The question is no longer about whether a distributed geo-processing infrastructure is appropriate, but rather about how it can be realized, and what issues need to be accounted for in order for it to materialize and grow in a scalable and extensible fashion. In this section, we use our understanding of the issues analyzed in this thesis to speculate on future dynamics of the GIS marketplace. We also highlight potential challenges and opportunities that are likely to affect the ultimate shape and growth path of distributed geo-processing. 5.4.1 Dynamics of the Future GIS Marketplace The unbundling of GIS systems into independently-provided interoperable components, and the delivery of subsets of GIS data to users on demand will lead to significant changes in the GIS marketplace. Figure 5.6 outlines a potential value chain for the future GIS marketplace. In this new distributed environment, the private sector as well as the public sector at the local, state and federal levels will all likely contribute in establishing and maintaining a national GIS infrastructure. The different players in the value chain will share different responsibilities according to their expertise. For instance, as discussed in Section 4.4.3, governmental agencies are well positioned to offer and maintain public data covering their areas of jurisdiction. National agencies such as NASA and NIMA (see Section 2.1.3) can also support and leverage such a distributed infrastructure by providing access to their data via interoperable interfaces, and using these interfaces for accessing each other's data. In the private arena, satellite imagery providers are likely to 111 follow an e-commerce model for providing users with on-demand access to their huge repositories of data. Integrated/cascaded can be customized for individual clients Frequently refreshed F tlrgenumefrshservices, data, large number of small transactions Infrastructure Providers Data Producers Service Providers tt MCI, AT&T Integrators Service Brokers t Specialized services for niche markets, most rely on integrators and service brokers to distribute their products Search engine-like services, enable clients to search and locate services, and mix and match them to solve their problem Figure 5.6 Potential value chain for the future GIS marketplace. With the unbundling of GIS, it will not be necessary for players to build comprehensive systems in order to gain a share of the market. The new environment will open the door to small niche players to enter this market with application specific offerings that leverage their understanding of particular industries or processes. In Section 4.5.3, we saw that the need for mediating services to coordinate service chaining will provide huge market entry opportunities for these new players. Nonetheless, these opportunities will be limited by the availability of data/service repositories and catalogs in the market, unless the new players provide their own suites of these services. However, even though these players possess the expertise in their respective arenas, they will not necessarily be interested in supporting the requisite back-end services. Instead, it is more likely that they will wait for enough services to become available on the market, and select partners from the players that provide them. The above dynamics and the new player opportunities particularly apply in the case of location-based services. Given the expected demand on location-based services and the rapid expansion of the wireless market, location-based services have the potential of becoming the killer applications that will drive the construction and delivery of various components of federated geoprocessing infrastructures. Indeed, we are currently witnessing the emergence of many early versions of such location-based services on the market (see Figure 5.2). However, the underlying architectures supporting such services many not necessarily scale well when more complex ser- 112 vices are needed, or when these services require connecting among distributed resources that are not easily put together in one place. The need for addressing these more complex requirements opens the door to independent service providers for supplying the data, the services as well as the mediating components to coordinate among various services. Furthermore, new opportunities may be available for some service providers to target niche markets in the cases when the back-end services are expensive, when service chaining requires specific domain expertise, or when the data provided is sensitive to local context and subcultures. Finally, in terms of the reaction of traditional GIS systems providers in the face of the new competition, we expect them to adapt their business models by offering access to components of their systems through portal-like applications. Until now, the traditional players have been intentionally slow at aggressively developing applications for thin client applications in order to protect their systems. In order to compete, these players will leverage their established brand names as well as their connections with their current customers. However, in order to maintain their current investments in their clients, we anticipate that the traditional players will tune their services to better perform when coupled with their own clients. As mentioned in Section 2.5.3, a distributed service model will not fully replace the core GIS systems, traditionally supplied by the big players. Those core systems will always be needed for building new GIS data. As shown in this thesis, the nature of distributed geo-processing in the future depends on many factors. In the following section, we discuss some of the challenges that may still need to be addressed, and some of the opportunities that are likely to accelerate the growth of distributed geoprocessing. 5.4.2 Challenges and Opportunities The future of distributed geo-processing holds many opportunities that are shaped by the issues discussed in Section 5.3. From the technical perspective, the eventual shape and growth path of distributed geo-processing will be determined by the outcome of the GML standardization process, the extensibility and scalability of data and message passing protocols employed in service chaining, and their impact on the design and complexity of catalogs and mediating services. On one hand, the future of distributed geo-processing will depend on the long term viability of the coopetition model exercised within OGC, and the outcome of its testbed approach for standardizing web mapping (see Section 2.3.4). On the other hand, the construction and proliferation rates of distributed geo-processing will depend on the evolution of broader IT and web technologies (see Sections 2.5 and 2.3.5), and the extent to which they can be leveraged in the GIS arena. 113 For example, many of the technologies used in the ASP field, especially those related to billing, security and authentication, may be directly applicable in the case of GIS. Similarly, distributed geo-processing may also benefit from the increasing interest within the software engineering world to adapt current object-oriented design tools (such as UML, the Unified Modeling Language) for designing XML Schemas [12]. With the growing interest in using UML for modeling and automating the flow of XML data and processing, a new generation of XML modeling technologies is on the horizon. These technologies can potentially be used in mediating services for providing dynamic service chaining to their clients. The use of such general purpose tools will undoubtedly accelerate the development of mediating services, and consequently, the evolution of distributed geo-processing. Finally, from the users' perspective, adopting and using the distributed geo-processing model will depend on the public and commercial availability of reliable interoperable geographic information services. Furthermore, as discussed in Section 2.1, the distributed geo-processing will be more attractive to users as more bandwidth becomes available via technological innovations such as the Internet II and the Next Generation Internet [76]. However, although these technologies promise to dramatically increase bandwidth over the years, the market of wireless location-based services will continue to be constrained by practical limitations on bandwidth, performance and connectivity [61]. Given the upside potential of the location-based services market (see Section 2.1.6 and Section 5.4.1), these practical issues should remain a priority. In summary, it is evident, from both the business and technical perspectives, that the future of distributed geo-processing is indeed promising. It remains to be seen how and when it will come about. 114 a Traditional GIS Model a U U WEWEUEWEE~ .- a Identify and load needed data layers for analysis v ---- -- ay- a S -- --------- O0 U F C j Locally prepare data for analysis 3 9 Locally Reproject layer I into the desired coordinate System C-,' * Locally Extract needed coverage from layer 2 Extract * Locally Address match table 1 Address Match-> 9 Extract and mosaic orthophotos from large locally stored tiles Extract & Mosaic I A1 f1# Begin analysis ft Appendix B Map and Capabilities Request Specifications B.1 Map Interface The Map interface is designed to provide clients of a map server with pictures of maps, possibly from multiple map servers. Upon receiving a Map request, a map server must either satisfy the request or throw an exception in accordance with the exception instructions. Table B. 1 provides an overview of the Map request parameters followed by an example on how it is used. For more information on the latest specifications or updates, refer to the full specification document available at www.opengis.org/techno/specs/. Parameter Description http://server-address/path/script? URL prefix of map server. WMTVER = 1.0.0 Request version, required. REQUEST = map Request name, required. LAYERS = layer-list Comma-separated list of one or more map layers, required. STYLES = style-list Comma-separated list of one rendering style per requested layer, required. Example of styles: points, contours, reference. SRS = srsidentifier Spatial Reference System (a text parameter that names a horizontal coordinate reference system code), required. Two namespaces are defined: EPSG and AUTO. Map servers advertise their SRS in their capabilities documents. BBOX = xmin,ymin,xmax,ymax Bounding box corners in SRS units, required. WIDTH = outputwidth Width in pixels of map picture, required. HEIGHT = output-height Height in pixels of map picture, required. FORMAT = output-format Output format of map, required. The formats are divided into four basic groups: Picture formats (GIF, JPEG;TIFFPNGetc.), Graphic Element formats (WebCGM, SVG), Feature formats (GML), other formats (MIME,INIMAGE,etc.). TRANSPARENT = trueorfalse If TRUE, the background color of the picture is made transparent if the image format supports transparency. Default = FALSE. BGCOLOR= color-value Hex value for the background color, optional. EXCEPTIONS =exceptionformat Format in which exceptions are to be reported by the server. XML document or INIMAGE (default = INIMAGE where the error message is returned graphically to the user), optional. Vendor-specific parameters Table B.1: The Map request. 116 An example of the use of the Map request would be: http: / /b-maps .com/map.cgi?WMTVER=1. 0.0 &REQUEST=map &SRS=EPSG%3A4326 &BBOX=-97.105,24.913,78.794,36.358 &WIDTH=560 &HEIGHT=350 &LAYERS=BUILTUPA &STYLES=0XFF8080 &FORMAT=PNG &BGCOLOR=0xFFFFFF B.2 Capabilities Interface The capabilities interface is designed to provide clients of map servers with a machine-parseable listing (an XML document) of what interfaces a map server supports, what map layers it can serve, what formats it can serve them in, etc. Below is a simplified version of the DTD for version 1.0.0. For the full version and the latest updates, refer to http://www.digitalearth.gov/wmt/xml/. <! ENTITY % KnownFormats GML.1 I WMSXML " GIF I JPEG GML.2 I GML.3 I MIME I PNG j I WebCGM I SVG WBMP INIMAGE I TIFF I GeoTIFF I PPM I BLANK > The Service element provides metadata for the service as a whole. -- > <!ELEMENT Service (Name, Title, Abstract?, Keywords?, OnlineResource, Fees?, AccessConstraints?) > <!-- <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT Name (#PCDATA) > Title (#PCDATA) > Abstract (#PCDATA) > Keywords (#PCDATA) > OnlineResource (#PCDATA)> Fees (#PCDATA)> AccessConstraints (#PCDATA)> <!ELEMENT Capability (Request, Exception?, VendorSpecificCapabilities?, Layer?) > <!-- Available WMT-defined request types are listed here. -- > <!ELEMENT Request (Map I Capabilities I FeatureInfo)+ > <!ELEMENT Map (Format, DCPType+)> <!ELEMENT Capabilities (Format, DCPType+)> <!ELEMENT FeatureInfo (Format, DCPType+)> <!ELEMENT DCPType (HTTP) > <!-- Available HTTP request methods. --- > <!ELEMENT HTTP (Get I Post)+ > -- > <!-- HTTP request methods. <!ELEMENT Get EMPTY> <!ATTLIST Get onlineResource CDATA #REQUIRED> <!ELEMENT Post EMPTY> <!ATTLIST Post onlineResource CDATA #REQUIRED> <!-- Available formats. Not all formats are relevant <!ELEMENT Format ( %KnownFormats; )+ > <!ELEMENT GIF EMPTY> <!-- Graphics Interchange Format <!ELEMENT JPEG EMPTY> <!-- Joint Photographics Expert <!ELEMENT PNG EMPTY> <!-- Portable Network Graphics -- 117 to all requests.--> -- > Group -- > > <!ELEMENT PPM EMPTY> <!-<!ELEMENT TIFF EMPTY> <!-- Portable PixMap <!ELEMENT GeoTIFF EMPTY> <!-- <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT -- > Tagged Image File Format -- > Geographic TIFF -- > WebCGM EMPTY> <!-- Web Computer Graphics Metafile -- > SVG EMPTY> <!-- Scalable Vector Graphics -- > WMS_XML EMPTY> <!-- eXtensible Markup Language -- > GML.1 EMPTY> <!-- Geography Markup Language, profile 1 -- > GML.2 EMPTY> <!-Geography Markup Language, profile 2 -- > GML.3 EMPTY> <!-- Geography Markup Language, profile 3 -- > <!ELEMENT WBMP EMPTY> <!-- Wireless Access <!ELEMENT MIME EMPTY> <!-- Multipurpose Internet Mail Extensions -- > Protocol (WAP) Bitmap -- > <!ELEMENT INIMAGE EMPTY> <!-- display text in the returned image -- > <!ELEMENT BLANK EMPTY> <!-- return an image with all pixels transparent if supported by the image format--> <!-- An Exception element indicates which output formats are supported for reporting problems encountered when executing a request. <!ELEMENT Exception (Format)> <!ELEMENT VendorSpecificCapabilities (your stuff here) > ( Name?, Title, Abstract?, Keywords?, SRS?, LatLonBoundingBox?, BoundingBox*, DataURL?, Style*, ScaleHint?, Layer* ) > <!ATTLIST Layer queryable (0 1 1) "0" > <!ELEMENT SRS (#PCDATA) > <!ELEMENT LatLonBoundingBox EMPTY> <!ATTLIST LatLonBoundingBox minx CDATA #REQUIRED miny CDATA #REQUIRED maxx CDATA #REQUIRED maxy CDATA #REQUIRED> <!ELEMENT BoundingBox EMPTY> <!ATTLIST BoundingBox SRS CDATA #REQUIRED minx CDATA #REQUIRED miny CDATA #REQUIRED maxx CDATA #REQUIRED maxy CDATA #REQUIRED> <!ELEMENT Layer <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ATTLIST DataURL (#PCDATA) > Style ( Name, Title, Abstract?, StyleURL? ) > StyleURL (#PCDATA) > ScaleHint EMPTY> ScaleHint min CDATA #REQUIRED max CDATA #REQUIRED> An example capabilities document would be: <WMTMSCapabilities version="1.0.0" updateSequence="0"> <Service> <Name>GetMap</Name> <Title>Acme Corp. Map Server</Title> <Abstract>Contact: webmaster@wmt.acme.com.</Abstract> <Keywords>bird roadrunner ambush</Keywords> <OnlineResource>http://hostname:port/path/</OnlineResource> <Fees>none</Fees> <AccessConstraints>none</AccessConstraints> </Service> <Capability> 118 <Request> <Map> <Format> <SGI /> <GIF /> <JPEG /> <PNG /> <WebCGM /> <SVG /> </Format> <DCPType> <HTTP> <Get onlineResource="http://hostname:port/path/mapserver.cgi" /> <Post onlineResource="http://hostname:port/path/mapserver.cgi" /> </HTTP> </DCPType> </Map> <Capabilities> <Format> <WMSXML /> </Format> <DCPType> <HTTP> <Get onlineResource="http://hostname:port/path/mapserver.cgi" /> </HTTP> </DCPType> </Capabilities> </Request> <Exception> <Format> <BLANK /> <WMS_XML /> </Format> </Exception> <Layer> <Title>Acme Corp. Map Server</Title> <SRS>EPSG:4326</SRS> <!-- all layers are available in at least this SRS -- > <Layer queryable="O"> <Name>wmtgraticule</Name> <SRS> EPSG:4326 </SRS> <SRS> EPSG:26986 </SRS> <Title>Alignment test grid</Title> <Abstract>The WMT Graticule is a 10-degree grid suitable for testing alignment among Map Servers.</Abstract> <Keywords>graticule test</Keywords> <LatLonBoundingBox minx="-180" miny="-90" maxx="180" maxy="90" /> <Style> <Name>on</Name> <Title>Show test grid</Title> <Abstract>The "on" style for the WMT Graticule causes that layer to be displayed.</Abstract> </Style> <Style> <Name>off</Name> <Title>Hide test grid</Title> <Abstract>The "off" style for the WMT Graticule causes that layer to be hidden even though it was requested from the Map Server. Style=off is the same as not requesting the graticule at all.</Abstract> </Style> 119 </Layer> <Layer queryable="0"> <Name>ortho</Name> <Title>MassGIS half-meter 1:5000 orthophoto series</Title> <Abstract>Panchromatic imagery mosaics for the Boston metropolitan area. Ground resolution: 0.5m. Photo dates range from 1992 to 1995.</Abstract> <Keywords>Boston Massachusetts MassGIS orthophoto</Keywords> <SRS>EPSG:26986</SRS> <SRS>EPSG:26930</SRS> <SRS>EPSG:26917</SRS> <SRS>EPSG:4326</SRS> <LatLonBoundingBox minx="-71.634696" miny="41.754149" maxx="-70.789798" maxy="42.908459 '/> <BoundingBox SRS="EPSG:26986" minx="189000" miny="834000" maxx="285000" maxy="962000" /> <Style> <Name>Default</Name> <Title>The only style</Title> <Abstract>The only style for this imagery series.</Abstract> </Style> <ScaleHint min="0.05" max ="500" /> </Layer> 120 Appendix C Projection and Re-projection C.1 Projection Surfaces Regular Conic Regular Cylindrical Polar Azimuthal (planle) Transverse Cylindrical Cylindrical Oblique Azimuthal (plane) Figure C.1 Sample Projection Surfaces [89]. 121 C.2 Types of Projections Projections have been classified according to the characteristics of the map that they maintain: area, shape and scale [89]. For instance, - Equi-areal projections maintain the area proportions of a map, i.e. two regions of the same size the projected map have the exact same area of the Earth. - Conformal projections maintain the shape characteristic, relative local angles about every point on the map are shown correctly. - Equidistant projections maintain a correct scale of the map whereby they ensure that the map contains one or more lines along which the scale remains true. C.3 Interpolation/Resampling Methods There are several methods used in the re-projection interpolation step to determine the brightness value of each pixel in the new projected image: - The nearest neighbor method assigns the original pixel value that is nearest to (xO,yO) to (x,y). By doing so, this method does not produce new pixel values. Instead, it only uses the values that were present in the original image. This method is commonly used because it is very simple and hence computationally faster than other methods. It however suffers from round-off errors and creates geometric discontinuities in the output map because it doesn't produce intermediary pixel values. In general, for visual purposes, these discontinuities are often negligible. - The bilinear interpolation algorithm uses the weighted average of the four pixels that surround the point (xO,yO) to estimate the brightness of (x,y). The advantage of this method is that it produces a smoother and more continuous image. It is however slower because it involves more computations, and it alters the original pixel values, creating a problem for spectral pattern recognition analysis. - The cubic interpolation method produces the sharpest and smoothest image by using the sixteen pixels surrounding the original point. Its drawback is that is more computational as it takes on average twice as much computation compared to the bilinear method. The cubic interpolation algorithm uses the weighted average of the sixteen pixels that surround the point (xO,yO) to estimate the brightness of (x,y). The advantage of this method is that it produces a smoother and more continuous image. It is however slower because it involves more computations, and, like the bilinear method, it also alters the original pixel values, creating a problem for spectral pattern recognition analysis. C.4 Prototype Capabilities Summary As mentioned in Chapter 3, the use of ArcInfo as the back-end server for the provision of the prototype's re-projection capabilities imposed some constraints on the range of functionality of the final product. For instance, the formats supported by our prototype were restricted to those supported by ArcInfo (bmp, tif, jpg and gif). For the same reason, the geo-referencing information types handled by the prototype were limited to the world file and the GeoTIFF formats. In addition, the prototype inherited ArcInfo's inability of handling the cases when the zoom factors are not identical in the x and y direction, or when the target projection is in the latitude/longitude coordinate system. In terms of projections, a subset of the EPSG codes were supported to facilitate the chaining of the re-projection service with WMT compliant map servers, a majority of which use the EPSG identifiers. However, although seemingly facilitating the representation of projections, we have found that the EPSG codes are not yet widely used by GIS vendors such as ESRI (even though ESRI is a participant in the WMT testbeds). Consequently, part of the implementation effort was 122 dedicated to creating translation tables between those codes and the internal SRS representations in ArcInfo, and later in a script using the proj utilities. The SRSs supported along with their equivalent parameters in ArcInfo are shown in Table C. 1. EPSG code Projection Datum Units Fipszone Zone 4267 geographic nad27 dms 4269 geographic nad83 dms 4326 geographic wgs84 dms 26986 stateplane nad83 meters 2001 26987 stateplane nad83 meters 2001 26786 stateplane nad27 feet 2002 26787 stateplane nad27 feet 2002 32030 nad27 feet 3800 32130 nad83 feet 3800 obgm meters 26930 nad83 meters 102 26730 nad27 feet 102 26717 nad27 meters 17 26917 nad83 meters 17 32617 wgs84 meters 17 26718 nad27 meters 18 26918 nad83 meters 18 32618 wgs84 meters 18 26719 nad27 meters 19 26919 nad83 meters 19 32619 wgs84 meters 19 27700 greatbritain-grid Table C.1 EPSG projections supported by re-projection prototype. C.5 Re-projection Approximation During the implementation phase, we noticed that re-projection using ArcInfo was both slow and expensive in terms of computational resources. At the same time, we observed that the resultant reprojected versions of images covering small areas appeared to be rotated and/or scaled versions of the original image. Indeed, as illustrated in Figure C.2 and Figure C.3, parallel lines are maintained through certain projections of these areas. This suggested that a linear approximation of the re-pro1. The extent to which these approximations work remains to be determined. 123 jection transformation is possible in these case. Given this observation, the objective of the approximation is articulated as follows: Given three non-collinear points of the original image and their exact re-projected coordinates in a new coordinate system, the transformation matrix needs to be determined and decomposed into translation, rotation and scaling components. The exact re-projected coordinates of the initial non-collinear points can be found using the proj utility. After determining the angle of rotation and the scaling factor, simple tools (such as pnmscale and pnmrotate) are used to transform the entire image. The examples shown next illustrate the development and application of this approximation on the re-projection of a 500x500 pixels image from Mass State Plane (Mass Mainland) to Nad83 Lat/Lon (EPSG code=4269), Nad83 Alabama west (EPSG code=26930) and Nad83 UTM zone 17 (EPSG code=26917). corresponds to the values shown in the following two figures. - - - - - 902800 -- 902600 902400902200_ 902000. 901800 901600 901400 901200 236000 236500 237000 237500 238000 Figure C.2 A rectangle in Mass State Plane Coordinates. -71.06 -71.06 -71.06 -71.05 -71.05 -71.05 -71.05 -71.05 -71.04 -71.04 Figure C.3 The reprojected rectangle in Lat/Lon, indicating that parallel lines were maintained through the projection. 124 Mass X Mass Y Lon Lat 236407 902490.36 -71.057965 42372968 237587.44 902590.36 -71043633 42.372912 237587.44 901402.36 -71.04371 42.362217 236407 901402.36 -71.05804 42.362273 236997.22 902590.36 -71.050799 42.37294 237587.44 901996.36 -71.043672 42.367564 236997.22 901402.36 -71.050875 42.362245 236407 901996.36 -71.058002 42.36762 Table C.2 Mass State plane points projected to Lat/Lon. C.5.1 Determining the Transformation Matrices for the Approximation In this example, we show the steps followed to determine the approximation parameters in reprojection case from Mass State Plane to Lat/Lon. First, three non-collinear points are selected and their projected equivalents in the target projection are calculated. In order to simplify the decomposition of the final transformation matrix into its rotation, translation and scaling components, homogeneous coordinates are used with a z = 1. Below, matrix X and Y contain the coordinates of the three points in the target and source projections respectively. -71.057965 42.372968 1] X = -71.043633 42.372912 11 _-71.04371 42.362217 236407 902590.36 1 Y = 237587.44 902590.36 1 23 7 5 8 7 .4 4 901402.36 1 1_ If T is the transformation matrix (transforming Y into X), then X=Y * T implies T = Inv(Y) * X (The inverse of Y exists since, as a result of the noncollinearity of the three points, its determinant is non zero) 1.21412x10 T = Inv (Y) * X = 6.48148x10 0 -73.98673928 -05 4.74399x10 08 3.86784x10 20 9.00253x0-0 34.25859062 -1.0842x10 1 1 The structure of the above transformation matrix indicates that the transformation is perspective-free (the last column is 0 0 1), and can hence be decomposed into a translation, rotation and scaling component. all a12 0 1 0 0 Cos 0 sinO 0 T= a21 a22 0 = 0 1 0 X -sino cos 0 j _a31 a32 1_ _Tx Ty I_ _ 0 0 1_ Sx 0 0 0 Sy 0 [0 0 1 125 T= -s- Sxcos -SxsinO Sx( Txcos - Ty sin) Sy sin0 SycosO 0 0 Sy(Txsin0 - TycosO) 1_ Solving, we get Sx = 0.1214x10 4 Tx = -6113982.409 Sy = 0.9x10 5 Ty = 3773223.926 0 = 0.30586509 degrees Error (sqrt of sum of square distances) = 2.82842e -06 For comparison purposes, the images obtained using this approximation as well as those obtained using actual re-projection in ArcInfo are attached to this appendix The same process is applied to the projection from Mass State Plane to Alabama West and UTM zone 17 as shown below. Mass X Mass Y Alabama X Alabama Y 236407 902490.36 1955680.61 1505879.265 237587.44 902590.36 1956865.815 1506108.656 237587.44 901402.36 1957096.676 1504915.847 236407 901402.36 1955911.418 1504686.495 236997.22 902590.36 1956273.212 1505993.934 237587.44 901996.36 1956981.221 1505512.19 236997.22 901402.36 1956504.046 1504801.144 236407 901996.36 1955796.072 1505282.835 Table C.3 Mass State points projected to Alabama West State X,Y. 1.00403663 T= Inv (Y) * X = 0.19432669 3.86784x10 20 -0.19432744 1.00404798 -1.0842x10~ [1893717.397 553695.0472 1 Sx = 1.0226694 19 _ Tx = 1920882.015 Ty = 179689.0226 Sy = 1.0226804 0 = 10.95394002 degrees Error (sqrt of sum of square distances) 126 = 0.10894959 Mass X Mass Y UTM X UTM Y 236407 902490.36 1318939.874 4739113.671 237587.44 902590.36 1320122.167 4739247.138 237587.44 901402.36 1320256.488 4738057.319 236407 901402.36 1319074.152 4737923.877 236997.22 902590.36 1319531.02 4739180.379 237587.44 901996.36 1320189.297 4738652.169 236997.22 901402.36 1319665.319 4737990.572 236407 901996.36 1319007.065 4738518.724 Table C.4 Mass State points projected to UTM zone 17 X,Y. 1.00157022 T = Inv (Y) * X = 0.11306506 3.86784x10 20 1.001531 -1.0842x101 [1184212.274 3808412.052 -0.11306415 Sx = 1.0079318 Tx Sy = 1.0078929 Ty 0 = 6.4406749 degrees = = 19 _ 1591338.21 3622940.495 Error (sqrt of sum of square distances) = 0.10294927 C.5.2 A Simpler Approach Given that raster images are basically arrays of pixels (which are at the end coordinate-free), a simpler way of obtaining the rotation angle and the scaling factor (given that the translation only applies to the geographic coordinates and has no meaning on a pixel basis) is as follows: Original Image Reprojected Image Lx2 Lyl X, I Yrb Ly2 Lx1 Ylb Xlb 127 'Xrb Which leads to: 0 = ArcTan (Yrb - Yxb) (Xrb- Xlb) Sx= Lx2/Lxl Sy = Ly2 / Ly2 C.5.3 Sample Images Original Image 128 Lat/Lon Approximation Lat/Lon reprojection using ArcInfo -ftm - OW7- - Alabama West approximation 0 Alabama West reprojection using ArcInfo UTM 17 approximation UTM 17 using ArcInfo References [1] Abel, D. & Ooi, B. & Tan, K. & Tan, S. Towards Integrated Geographical Information Processing. International Journal of Geographical Information Science, Vol. 12, No. 4, June 1998, pp. 353-371. [2] Adam, N. & Gangopadhyay, A. DatabaseIssues in GeographicInformation Systems. Kluwer Academic Publishers 1997. [3] Adhikar, Richard. ASP's Pay Per Use Tool. PlanetIT,02/11/00. Available at http://www.planetit.com/techcenters/docs/internet_&_intranet/technology/PIT20000204S00 19. [4] Alameh, Nadine S. Internet-Based Collaborative Geographic Information System. Master Thesis Submitted to the Department of Urban Studies and Planning and the Department of Civil and Environmental Engineering at MIT. June 1997. [5] Arctur, D. & Hair, D. & Timson, G & Martin, E.P. & Fegeas, R. Issues and Prospects for the Next Generation of the Spatial Data Transfer Standard (SDTS). InternationalJournal of GeographicalInformation Science, Vol. 12, No. 4, June 1998, pp. 403-425. [6] ASP Industry Consortium Web Page.Available at http://216.64.7.75. [7] Barbara, D. & Jain, R. & Krishnakumar, N. (Editors) Databasesand Mobile Computing. Kluwer Academic Publishers 1996. [8] Benjamin, Suzan. Using Digital Orthophotos in a Desktop GIS. ACSM-ASPRS Annual Convention Papers,Vol. 4, 1990, pp. 60-67. [9] Bernstein, Philip. Middleware: A Model for Distributed System Services. Communicationsof the ACM, Vol. 39, No. 2, February 1996, pp. 86-98. [10] Berry, John. Should You Use an Application Service Provider? PlanetiT,10/13/99. Available at http://www.planetit.com/techcenters/docs/enterprise-apps/technology/PIT19991012S0062. [11] Bishr, Yaser. Overcoming the Semantic and Other Barriers to GIS Interoperability. International Journalof GeographicalInformation Science, Vol. 12, No. 4, June 1998, pp. 299-314. [12] Booch, G & Christerson, M. & Fuchs, M. & Koistinen, J. UML for XML Schema Mapping Specification. Rational Website, December 1999. Available at http://www.rational.com/media/ uml/resources/media/umlxmlschema33.pdf. [13] Bouguettaya, A. & King, R. Large Multidatabases: Issues and Directions. Interoperable DatabaseSystems (DS-5). Edited by Hsiao, D.K. & Neuhold, E.J. & Sacks-Davis, R. 1993, pp. 55-68. [14] Bradenburger, Adam. Coopetition. New York: Doubleday, 1996. 132 [15] Branscomb, L. & Kahin, B. Standards Processes and Objectives for the National Information Infrastructure. Standards Policyfor Information Infrastructure.Edited by Kahin, B. & Abbate, J. MIT Press 1995. [16] Bray, Olin. DistributedDatabaseManagement Systems. Lexington Books 1982. [17] Brodie, Michael. The Promise of Distributed Computing and the Challenges of Legacy Information Systems. InteroperableDatabase Systems (DS-5). Edited by Hsiao, D.K. & Neuhold, E.J. & Sacks-Davis, R. 1993, pp. 1-32. [18] Buehler, K. & McKee, L. The OpenGIS Guide: Introduction to InteroperableGeoprocessing and the OpenGIS Specification, Third Edition. Open GIS 1998. [19] Cargill, Carl F. Open Systems Standardization:A Business Approach. Prentice Hall 1997. [20] Carroll, Michael L. Cyberstrategies:How to Build an Internet-Based Information System. Van Nostrand Reinhold 1996. [21] Cassettari, Seppe. Introduction to IntegratedGeo-Information Management. Chapman & Hall 1993. [22] Cheung, T. & Fong, J. & Siu, B. (Editors). DatabaseReengineering and Interoperability.Plenum Press 1996. [23] Chorafas, Dimitris. Network Computers Versus High PerformanceComputing. Cassell 1997. [24] Clement, G & Larouch, C. & Morin, P. & Gouin, D. Interoperating Geographic Information Systems Using the Open Geospatial Datastore Interface (OGDI). Interoperating Geographic Information Systems. Edited by Goodchild, M. & Egenhofer, M. & Fegeas, R. & Kottman, C. Kluwer Academic Publishers 1999, pp. 283-300. [25] Coleman, D. & Mclaughlin, J. Information Access and Network Usage in the Emerging Spatial Information MarketPlace. Journalof the Urban and Regional Information Systems Association (URISA), Vol. 9, No. 1, pp. 8-19, Spring 1997. [26] Cook, Melissa A. Building Enterprise Information Architectures. Prentice Hall 1996. [27] Cranston, C. & Brabec, F. & Hjattason, G. Adding an Interoperable Server Interface to a Spatial Database: Implementation Experiences With OpenMap. Interoperating GeographicInformation Systems INTEROP'99. Proceedings of Second International Conference. Zurich, Switzerland. pp. 115-128. [28] Cuthbert, Adrian. OpenGIS: Tales from a Small Market Town. Interoperating Geographic Information Systems INTEROP'99. Proceedings of Second International Conference. Zurich, Switzerland. pp. 17-28. [29] Daniel, Larry. Challenges for the Next Millennium. Business Geographics,January 2000, pp. 12-15. 133 [30] Deckmyn, Dominique. Microsoft Unveils Internet Service Strategy. Computerworld, June 22, 2000. [31] Devogele, T. & Parent, C. & Spaccapietra, S. On Spatial Database Integration. International Journalof GeographicalInformation Science, Vol. 12, No. 4, June 1998, pp. 335-352. [32] Doyle, A. & Dietrick, D. Building a Prototype OpenGIS Demonstration from Interoperable GIS Components. InteroperatingGeographicInformation Systems INTEROP'99. Proceedings of Second InternationalConference. Zurich, Switzerland. pp. 139-149. [33] Doyle, Allan. Web Map Server Interface Specification: OpenGIS Document 99-077r6. 03/16/ 00. [34] Evans, John. Infrastructuresfor Sharing Geographic Information Among Environmental Agencies. Ph.D. Dissertation Submitted to the Department of Urban Studies and Planning. June 1997. [35] Evans, John. Interoperable Web-Based Services for Digital Orthophoto Imagery. Photogrammetric Engineeringand Remote Sensing, Vol. 65, No. 5, May 1999, pp. 567-57 1. [36] Elmagarmid, A. & Rusinkiewicz, M. & Sheth, A. (Editors). Management of Heterogeneous and Autonomous DatabaseSystems. Morgan Kaufmann Publishers, Inc. 1999. [37] Finch, I. & Small, E. Information Brokers for a Web-Based Geographic Information System. InteroperatingGeographicInformation Systems. Edited by Goodchild, M. & Egenhofer, M. & Fegeas, R. & Kottman, C. Kluwer Academic Publishers 1999. pp. 195-202. [38] Fowler, Robert. Digital Orthophoto Concepts and Applications: A Primer for Effective Use. GeoWorld, Vol. 12, No. 7, July 1999, pp. 42-46. [39] Frank, Steven. The National Data Infrastructure: Designing Navigational Strategies. Journal of the Urban and Regional Information Systems Association (URISA), Vol. 6, No. 1, Spring 1994, pp. 37-55. [40] Fuller, Gary. A Vision for a Global Geospatial Information Network (GGIN): Creating, Maintaining and Using Globally Distributed Geographic Data, Information, Knowledge and Services. PhotogrammetricEngineering and Remote Sensing, Vol. 65, No. 5, May 1999, pp. 524- 535. [41] Gaede, Volker. Spatial Internet Marketplaces from a Database Perspective. Interoperating Geographic Information Systems. Edited by Goodchild, M. & Egenhofer, M. & Fegeas, R. & Kottman, C. Kluwer Academic Publishers 1999, pp. 415-426. [42] Garie, Henry. The National Spatial Data Council - A True Partnership for NSDI. GeoWorld, Vol. 12, No. 11, November 1999, pp. 36-37. [43] Gifford, Fred. Internet GIS Architectures- Which Side is Right For You? GeoWorld, Vol. 12, No.5, May 1999, pp. 48-53. [44] Global Geomatics Webpage. URL: http://www.globalgeo.com. 1999. 134 [45] Goodchild, M. Geographical Information Science. InternationalJournal of Geographical Information Science, Vol. 6, No. 1, 1992, pp. 31-45. [46] Goodchild, M. & Egenhofer, M. & Fegeas, R. Interoperating GISs: Report of the Specialist Meeting Held Under the Auspices of the Varenius Project. December 5-6, 1997. [47] Goodchild, Michael. Preface. Interoperating Geographic Information Systems. Edited by Goodchild, M. & Egenhofer, M. & Fegeas, R. & Kottman, C. Kluwer Academic Publishers 1999. [48] Gouin, D. & Morin, P. Solving the Geospatial Data Barrier. Geomatica, Vol. 51, No. 3, 1997, pp. 278-287. [49] Gould, M. & Ribalaygua, A. SVG: A New Way of Web-Enabled Graphics. GeoWorld, Vol. 12, No. 3, March 1999, pp.4 6 -4 8 . [50] Gunther, 0. & Muller, R. From GISystems to GIServices: Spatial Computing on the Internet Marketplace. Interoperating Geographic Information Systems. Edited by Goodchild, M. & Egenhofer, M. & Fegeas, R. & Kottman, C. Kluwer Academic Publishers 1999, pp. 427-442. [51 ] Harder, Christian. Serving Maps on the Internet: GeographicInformation on the World Wide Web. Environmental Systems Research Institute 1998. [52] Harvey, F. & Kuhn, W. & Pundt, H. & Bishr, Y & Riedemann, C. Semantic Interoperability: A Central Issue for Sharing Geographic Information. Annals of Regional Science, Vol. 33, 1999, pp. 213-232. [53] Hoch, Robert. A New Era Dawns for Federal Geospatial Programs. GeoWorld, Vol. 12, No. 7, July 1999, pp.4 8 - 5 2 . [54] Hughes, John. Satellite Imagery Providers Face A Marketing Challenge. GeoWorld, Vol. 12, No. 11, November 1999, pp. 8. [55] Hurson, A. & Bright, M., & Pakzad, S. Multidatabase Systems: An Advanced Solution for Global Information Sharing. IEEE Computer Society Press 1994. [56] Industry Outlook '99: GIS Melts Into IT. GeoWorld, Vol. 11, No. 12, December 1998, pp. 4049. [57] Interoperabilityof GeographicInformation, Research Initiative of the University Consortium for Geographic Information Science (UCGIS 1996). Available at http://www.ucgis.org. [58] ISOITC 211 Geographic Information/Geomatics. http://www.statkart.no/isotc211/ [59] Issak, J. & Lewis, K. Open Systems Handbook. IEEE Standards Press 1994. [60] Ismail, Ayman. A DistributedSystem Architecturefor Spatial Management to Support Engineering Modeling. Master Thesis Submitted to the Department of Urban Studies and Planning. June 1999. 135 [61 ] Jensen, John. IntroductoryDigital Image Processing:A Remote Sensing Perspective. Prentice Hall 1986. [62] Johnston, Stuart. The future of Com: Microsoft's .Net Revealed. XML Magazine, Fall 2000. [63] Kahin, B. & Keller, J. Coordinatingthe Internet. MIT Press 1997. [64] Keller, S.F. Modeling and Sharing Geographic Data with INTERLIS. Computer and GeoSciences, Vol. 25, No. 1, 1999, pp. 49-59. [65] Kmitta, John. High-wire(less) Acts. Business Geographics,February 2000, pp. 6-8. [66] Kottman, Clifford. The OpenGIS Consortium and Progress Toward Interoperability in GIS. InteroperatingGeographicInformation Systems. Edited by Goodchild, M. & Egenhofer, M. & Fegeas, R. & Kottman, C. Kluwer Academic Publishers 1999.pp. 39-54. [67] Kramer, B. & Papazoglou, M. & Schmidt, H. Information Systems Interoperability.Research Studies Press 1998. [68] Krol, Ed. The Whole Internet: User's Guide and Catalog.O'Reilly & Associates 1994. [69] Landgraf, Gunther. Evolution of EO/GIS Interoperability Towards an Integrated Application Infrastructure. InteroperatingGeographic Information Systems INTEROP'99. Proceedingsof Second InternationalConference. Zurich, Switzerland. pp. 29-40. [70] Levinsohn, Allan. Spelling Out the Spatial Database Soup: Database Vendors Expand GIS Opportunities Enterprisewide. GeoWorld, Vol. 13, No. 13, March 2000, pp. 38-42. [71] Libicki, Martin C. Standards: The Rough Road to the Common Byte. National Defense University 1995. [72] Litwin, W. & Mark, L. & Roussopoulos, N. Interoperability of Multiple Autonomous Databases. ACM Computing Surveys, Vol. 22, No. 3, September 1990, pp. 267-293. [73] McKee, Lance. Catch the Integrated Geoprocessing Trend. GeoWorld, Vol. 12, No.2, February 1999, pp. 3 4 . [74] McKee, Lance. 1999 Geodata Forum Will Address Key Issues. GeoWorld, Vol. 12, No.4, April 1999, pp. 3 0 . [75] McKee, Lance. The Impact of Interoperable Geoprocessing. PhotogrammetricEngineering and Remote Sensing, Vol. 65, No. 5, May 1999, pp. 564-566. [76] National Research Council. DistributedGeolibrariesSpatial Information Resources. National Academy Press 1999. Also Available at http://bob.nap.edu/html/geolibraries/ [77] Nebert, Douglas. Interoperable Spatial Data Catalogs. Photogrammetric Engineering and Remote Sensing, Vol. 65, No. 5, May 1999, pp. 573-575. 136 [78] Open GIS Web Mapping Special Interest Group Public Web Pages. Available at http:// www.opengis.org/wwwmap/index.htm, 1999. [79] Open GIS Abstract Specification: Catalog Services. Available at http://www.opengis.org/public/abstract/99-113.pdf,1999. [80] Ozsu, M. & Valduriez, P. Principles of DistributedDatabaseSystems. Prentice Hall 1999. [81] Pascoe, R. & Penny, J. Construction of Interfaces for the Exchange of Geographic Data. International Journal of GeographicalInformation Science, Vol. 4, No.2, 1990, pp. 147-156. [82] Peabody, George. What's New about the ASP Model? PlanetiT,10/11/99. Available at http:// www.planetit.com/techcenters/docs/new-economy/expert/PIT19991007SO03 4 [83] Peng, Z. & Neber, D. An Internet-Based GIS Data Access System. Journalof the Urban and Regional Information Systems Association (URISA), Vol. 9, No. 1, Spring 1997, pp. 20-30. [84] Ritter, N. & Ruth, M. The GeoTIFF Data Interchange Standard for Raster Geographic Images. InternationalJournalof Remote Sensing, Vol. 18, No.7, 1997, pp. 1637-1647. [85] Schowengerdt, Robert. Techniquesfor Image Processing and Classificationin Remote Sensing. Academic Press Inc. 1983. [86] Sheth, Amit. Changing Focus on Interoperability in Information Systems: From System, Syntax, Structure to Semantics. Interoperating GeographicInformation Systems. Edited by Goodchild, M. & Egenhofer, M. & Fegeas, R. & Kottman, C. Kluwer Academic Publishers 1999, pp. 5-29. [87] Simon, Errol. DistributedInformation Systems from Client/Server to DistributedMultimedia. McGrawHill 1996. [88] Singh, Harry. Heterogeneous Internetworking. Prentice Hall 1996. [89] Snyder, John. Map Projections- A Working Manual. U.S. Government Printing Office 1987. [90] Stewart, John. IETF Structure and Internet Standards Process. Available at http:// www.ietf.cnri.reston.va.us/structure.html. [91] Stikeman, Alexandra. Where in the World? A New Scheme Unites the Internet and Geography. Technology Review, January/February 2001, pp. 34. [92] Strand, Eric. XML Provides Web-Based GIS a Path to Scalable Vector Graphics. GeoWorld, Vol. 11, No.8, August 1998, pp. 28-30. [93] Strand, Eric. Will GIS be the Next ERP Module? GeoWorld, Vol.12, No. 11, November 1999, pp. 74. [94] Tari, Zahir. Interoperability between Database Models. InteroperableDatabase Systems (DS5). Edited by Hsiao, D.K. & Neuhold, E.J. & Sacks-Davis, R. 1993, pp. 101-118. 137 [95] Thoen, Bill. The Fairy Tale of the Ugly Standard. Geo World, Vol. 11, No.8, August 1998, pp. 32-34. [96] Tosta, N. & Domaratz, M. The U.S. National Spatial Data Infrastructure. Geographic Information Research, Taylor and Francis 1997. Edited by Craglia, M., Couclelis, H. [97] United Nations Economic and Social Commission for Asia and the Pacific. GIS Standards and Standardization:A Handbook. United Nations Publication 1998. [98] Vckovski, Andrej. Interoperableand DistributedProcessing in GIS. Taylor & Francis 1998. [99] Vckovski, Andrej. Interoperability and Spatial Information Theory. Interoperating Geographic InformationSystems. Edited by Goodchild, M. & Egenhofer, M. & Fegeas, R. & Kottman, C. Kluwer Academic Publishers 1999, pp. 31-37. [100] Venners, B. Jini: New Technology for a Networked World. JavaWorld, June 1999. [101] Voisard, A. & Schweppe, H. Abstraction and Decomposition in Interoperable GIS. International Journalof GeographicalInformation Science, Vol. 12, No. 4, June 1998, pp. 315-333. [102] Wainewright, Phil. Interliant: ASP Fusion for the Enterprise. ASPNews Case Studies, September 1999. Available at http://www.aspnews.com/_Pubs/Interliant-study.pdf. [103] Whiting, Rick. Software Morphs Into a Service. InformationWeek. October 11, 1999. Available at http://www.informationweek.com/756/svcs.htm. [104] Wiederhold, Gio. Mediation to Deal with Heterogeneous Data Sources. InteroperatingGeographic Information Systems INTEROP'99. Proceedings of Second InternationalConference. Zurich, Switzerland. pp. 1-16. [105] Wilson, J.D. Is Field Computing the GIS Killer App? GeoWorld, Vol. 11, No.12, December 1998, pp. 40-49. [106] Wilson, J.D. GIS/ERP Integration Opens New Markets. GeoWorld, Vol. 11, No. 6, June 1998, pp. 82. [107] Wolberg, George. DigitalImage Warping. IEEE Computer Society Press 1990. [108] Worboys, Michael. GIS A Computing Perspective. Taylor and Francis 1999. [109] World Wide Web Consortium GraphicsWeb Page. Available at http://www.w3c.org/Graphics. [110] Zedan, HSM (Editor). DistributedSystems. Butterworths 1990. 138