Evaluating Parallel R-Tree Implementations on a Network of Workstations (An Extended Abstract) Ning An Liujian Qian Anand Sivasubramaniam Tom Keefe Department of Computer Science & Engineering The Pennsylvania State University University Park, PA 16802. Phone: (814) 865-1406 Fax: (814) 865-3176 Email: anand@cse.psu.edu 1 Introduction Several GIS applications are characterized by the vast amount of information that needs to be stored, retrieved and analyzed. The volume and complexity of the data will continue to grow in the future as is apparent from the expected geo-spatial petabyte data set for NASA’s EOSDIS project which will hold a collection of raster images arriving at a rate of 3-5 Mbytes/second for 10 years from satellites orbiting the earth. In addition to just being able to handle these large data sets, a GIS should also be able to perform queries on this data efficiently to meet certain real-time constraints. Queries to a GIS are not necessarily limited to spatial searches or selections. The response times for all queries should be maintained as low as possible. To summarize, there are are three main requirements for a GIS to be successful in handling the demands of current and emerging applications: The GIS must be able to store large data sets. The GIS must be able to store, retrieve and process these large data sets efficiently. The GIS must provide low response times and high throughput for complex queries on the data sets. To meet these requirements, a GIS must employ a high performance computer system. Conventional platforms for GIS have used high performance Input/Output (I/O) subsystems (to store the large repositories of data) that are attached to a high performance workstation [3]. However, despite the I/O parallelism offered by some of these systems (such as RAID), the channel between the processing center and I/O system can itself become a bottleneck, limiting the speed of data transfer. Further, such an architecture does not provide any additional computational power for executing This research is supported in part by a NSF Career Award MIP-9701475, EPA grant R825195-01-0, and equipment grants from NSF and IBM. complicated queries beyond the raw processing power of the native workstation. This observation leads us to believe that a balanced high performance platform for a GIS should support parallelism in processing (CPUs), primary (memory) and secondary (disk) storage, as well as I/O channels. Recent trends in computer architecture show that a Network of Workstations (NOW) is emerging as a cost-effective solution to high performance computing. The term, workstations, is used rather loosely in this context and includes high performance personal computers as well. Such systems are commercially more viable since they use cost-effective off-the-shelf components compared to custom-built parallel machines. Advances in networking technology has made it possible to connect off-the-shelf workstations with a high bandwidth network such as ATM and Myrinet. Further, several recent communication software layers have been developed for these recent network innovations to deliver low latency, high end-to-end bandwidths by circumventing the operating system in the data transfer mechanism. It is thus feasible today to put together a cost-effective high performance platform for GIS consisting of rapidly (and constantly) improving off-the-shelf workstations and network hardware. The multiple CPUs and their memories can provide processing and primary storage parallelism, while disks connected to individual workstations on this network can provide secondary storage parallelism for both data access and data transfer. However, there are several open research issues to be addressed in harnessing the full capabilities of such a platform to realize a high performance GIS, and this study takes a step towards this goal. Several algorithmic, software and hardware design alternatives/parameters can significantly impact the performance of a GIS on a NOW platform. On the algorithmic side, the choice of data structures used to maintain the geographic information and algorithms for manipulating these data structures are largely influenced by the information that is being stored and the queries on this information. For instance, numerous data structures have been proposed [6] and evaluated to support spatial access. However, these evaluations have not considered a parallel implementation on a NOW platform. On the software side, the distribution of data between different workstations would dictate the communication overheads in the execution. Placing a majority of the data locally would lower communication costs for a workstation, but it could result in a mismatch of workload between workstations resulting in loss of parallelism. We need to ensure that queries that are focussed on a few data items involve as few workstations as possible to minimize communication overheads, while queries that involve large search windows are distributed uniformly across the workstations. It is thus important to keep both data placement and load balancing in mind when designing a GIS on a NOW platform. There are also several software messaging alternatives ranging from traditional (and expensive) TCP/IP sockets, to RPCs and more recent (and efficient) user-level messaging layers that would have a direct bearing on the communication and synchronization overheads in the execution. On the hardware side, the processing capabilities of the workstation CPUs, amount of physical memory, disk bandwidth, and the network used to connect the workstations are just a few of the important parameters likely to impact performance. There are thus numerous design alternatives related to implementing a GIS on a NOW. A comprehensive evaluation of all these alternatives is too ambitious, but this study takes the first step towards this goal by keeping constant a few of the alternatives, and varying others. First, we limit the study to spatial data structures, specifically the R-Tree [2] for storing vector data and evaluate the performance of insertions and spatial searches of the R-Tree on the NOW platform. The hardware NOW platform in this exercise consists of up to a dozen UltraSPARC Model 170 workstations connected by 100 Mbit/sec Ethernet and 1.28 Gbit/sec Myrinet (both of which are switched networks). TCP/IP sockets (kernel-based) on this hardware is used for communication. An extensive implementation and evaluation of a geospatial DBMS for shared nothing architectures has been undertaken in [5]. While this is a comprehensive summary of experiences in developing a complete environment, it is not very informative on how best to distribute the data between the different servers for load balancing and minimizing communication. The closest study to what is presented in this paper is the one by Koudos et al. [4] wherein a technique to decluster an R-tree across a network of workstations is outlined. However, they consider just one possible way of distributing the R-tree structure. To our knowledge, this is perhaps the first study to extensively evaluate a spatial data structure, such as an R-Tree, experimentally on a Network of Workstations (NOW) platform and investigate different trade-offs in design alternatives. The specific contributions of this paper are: A generic framework for distributing hierarchical spatial data structures has been developed. This framework is used to implement and evaluate different data distribution and load balancing schemes for R-Trees. The impact of number of workstations, size and nature of the data set, and network hardware (Ethernet vs. Myrinet) on the performance of insert and spatial search operations is also studied for the different distribution schemes for the Sun Solaris 2.6 environment on UltraSPARC-170 workstations. 2 Design and Implementation The R-tree (and its variants) is an effective and well understood multi-dimensional access method, and we focus on various methods of adapting it to the NOW platform rather than trying to develop a new spatial index. There are numerous design alternatives for implementing this data structure on a NOW platform, and we have designed a taxonomy to classify these different alternatives. The nodes that comprise the R-tree are distributed among a number of virtual machines. These virtual machines can eventually be mapped on to a physical machine. As the tree grows, new portions are allocated and integrated into the index. We partition the space of design alternatives to implement the R-tree in terms of (1) The allocation unit; (2) The allocation frequency; and (3) The distribution policy. An allocation unit defines the granularity of allocation, and we can have three possible units: (a) Element, where every data item is individually allocated to a machine; (b) Block, where a small fixed number of index nodes are allocated to machines independently; and (c) Subtree, where a large number of blocks are allocated as a unit. The allocation frequency determines when these allocations are made, and we identify two choices: (a) Static, where allocation is done once, when the index is built; and (b) Overflow, where allocation decisions are made when an allocation unit overflows. The distribution policy determines where (virtual/physical machine) an allocation is made. Three distribution policies are considered as follows: (a) Clustering, where an attempt is made to allocate units that are spatially near each other on the same machine; (b) Declustering, which tries to allocate units that are spatially near each other on different machines. (c) Balance, which attempts to balance the data between the machines (such as allocating them in a round-robin fashion). Further details on this design taxonomy and the implementation of an extensive client-server system that can be used to prototype the different design choices can be found in [1]. 3 Summary of Results We have experimentally evaluated the performance of two different implementations of a parallel R-Tree on our NOW platform. Three kinds of queries (insert, spatial search with a large query window, and spatial search with a narrow query window) have been used on both clustered and uniform data sets. Metrics used for comparison are the response times for insert operations and spatial searches, as well as the throughput over multiple queries for the spatial searches with narrow query window. It is shown that parallelization of the R-Tree structure on a NOW platform consisting of eight Sun UltraSPARC workstations can give significant improvement in response times and throughput. The reader is referred to [1] for detailed performance results. References [1] N. An, L. Qian, A. Sivasubramaniam, and T. Keefe. Evaluating Parallel R-Tree Implementations on a Network of Workstations. Technical Report CSE-98-006, Dept. of Computer Science and Engineering, The Pennsylvania State University, May 1998. [2] A. Guttman. R-trees: a dynamic index structure for spatial searching. In Proceedings of the 1984 ACM-SIGMOD Conference, pages 47–57, Boston, Mass, June 1984. [3] I. Kamel and C. Faloutsos. Parallel R-Trees. In Proceedings of the 1992 ACM-SIGMOD Conference, pages 195–204, CA, June 1992. [4] N. Koudas, C. Faloutsos, and I. Kamel. Declustering spatial databases on a multi-computer architecture. In EDBT, pages 592–614, Avignon, France, March 1996. [5] J. Patel et al. Building a Scalable Geo-Spatial DBMS: Technology, Implementation, and Evaluation. In Proceedings of the 1997 ACM-SIGMOD Conference, pages 336–347, June 1997. [6] H. Samet. The Design and Analysis of Spatial Data Structures. Addison-Wesley, 1989.