TICS Thin Informedia Client System A Distributed Systems (15-612) Project By Arshaad Amirza Brian Willingham Danny Lee Keng Lim amirza@andrew.cmu.edu bdw2@andrew.cmu.edu dlee@cs.cmu.edu kengl@andrew.cmu.edu Executive Summary The project aims to transform the current Informedia 1 client application into a distributed client/server application. We will reduce the computing resources requirements imposed on the client machine by moving the query engines of the client application to remote systems. We will ensure that such a system is highly available by applying replication techniques. We will also seek to improve performance by load balancing the search queries on a set of replica search engines. Tradeoffs in performance due to network latencies inherent in a distributed system will also be examined. 1. Requirements Analysis To access digital video libraries, a client machine loads the following into the same memory address space: A Visual Basic front-end application called IDVLS. Three DLLs for interfacing with the search engines The search engines. (currently, we have identified them as the Text Search Engine, The Face/Image Search Engine and the Language Translation Search Engine) 1 The Informedia Digital Video Library is a research initiative at Carnegie Mellon University funded by the NFS, DARPA, NASA and others. Research in the areas of the speech recognition, image understanding and natural language processing supports the automatic preparation of diverse media for full content and knowledge based search and retrieval. Draft Version 1.0 A written statement from Andrew Berry gave us an outline regarding the issues we would have to consider. Our meeting with Michael Christel, Alex Hauptman and Norm proved fruitful in assessing the current system performance. These are the problems of the current system that we have identified: Poor Response Time The first query requires a long time since the search indices have to be built. In a demonstration of the current system on a Pentium Pro PC with 32 MB of RAM this took approximately five minutes 2. Idle Time Between Queries While it is building the search indices the application sits idle (with an hour glass icon displayed). The user is prevented from performing any other type of operations. Memory Intensive The current requirements on the client machine is very memory intensive. The modules loaded are outlined above. User of a client system frequently runs into “low on virtual memory” errors. The goal of our project is to resolve the above problems by creating a thin client that has lower systems requirements on the client machine. 2. High Level Design We will achieve this goal by employing distributed techniques. The current system will be dissected into modules that will be distributed. This will benefit the client system with a bigger pool of computing resources. Our design extracts the various components of the current client into individual modules. The design consists of Client Modules, Query Managers (QM), a Load Balancer (LB) and Search Engines(SE). These modules are illustrated in Figure 2. 2 More benchmarks will be provided in our final project documentation. Draft Version 1.0 Client Module The Client Module (CM) represents the client software residing on client machines. It consists of two components; the IDVLS Visual Basic application and a Client Request Manager. An API layer will interface the IDVLS application and the Client Request Manager. Client Request Manager The Client Request Manager (CRM) manages requests from the IDVLS application and transfers these request to the Query Manager. The CRM continuously probes the Query Manager to see if it is alive. In the event of a failure the CRM will provide the required detour to the backup Query Manager. The client Request Manager also makes it possible to cater query requests from more than one IDVLS on a single client machine assuming that multiple copies of IDVLS can run in a single address space. Query Manager This module manages request from multiple clients. In Figure 2., we show the primary Query Manager (QM) processing requests from two separate clients. The requests are processed and sent to the appropriate Load Balancer module. It is the function of the Query Manager to determine which back end search engine will receive the request based on the type of the request. For instance, if a query is a Text query, the Query Manager will ensure that the query is routed to a Text Search Engine. Load Balancer The Load Balancer (LB) is a module that sits transparently between a Query Manager and a set of search engines. A Query Manager sends a query to a Load Balancer and will receive a result of the query as a reply. As far as the Query Manager is concerned, the Load Balancer returns results of queries. The Load Balancer as its name suggest, provides the functionality needed to distribute queries and balance the load on a set of search engines. Search Engine Each of these Search Engines (SE) encapsulates a search engine that will actually process a query. We replicate the Search Engine module in order to perform load balancing. 3. Distributed Systems Innovations Our design will make employ of the following features of distributed systems: Naming Service We will have protocols so that the Client Request Manager will be able to identify the primary Query Manager. In the event the primary Query Manager fails the Client Request Manager must be able to route via the backup Query Manager. Naming resolution will also enable the Query Manager to identify the primary Load Balancer module. Replication Management Approach For both the Query Manager and the Load Balancer modules, we will us a Primary Backup approach in our provision for fault tolerance. We have yet to determine the exact number of backup modules. In Figure 2., only one backup is shown per module. However, this does not imply that our final design and implementation will have only one backup per module. Load Balancing The Load Balancer serves as a hub for all the requests. The Load Balancer has a host of search engines connected to it. The Load Balancer maintains a set of queues of query requests. Each queue corresponds to a Search Engine. By managing the queues, the Load Balancer can determine the load on each Search Engine based on the queue lengths. Draft Version 1.0 4. Current Development The next phase of our project entails a detailed implementation design. We are currently in the process of creating a class hierarchy for the entire system. We have also begun work on implementing a simple client server prototype in CORBA, which will be the enabling distributed object technology employed in this project. The development platform of choice is C++ for the client and Java for some of the server modules. Implementation of the system will be on multi platforms. Thin clients will run on Windows NT workstations, and server modules of the system will run on both Unix and Windows NT. Draft Version 1.0 Draft Version 1.0