InteGrade: Object-Oriented Grid Middleware Leveraging Idle Computing Power of Desktop Machines http://gsd.ime.usp.br/InteGrade Andrei Goldchleger, Fabio Kon, Alfredo Goldman and Marcelo Finger {andgold,kon,gold,mfinger}@ime.usp.br Department of Computer Science IME/USP Motivation need for computation • High demand for computationally-instensive applications – multimedia processing – scientific computing – finantial simulations and predictions – weather forecast – oil drilling – schedulling, planning, etc. 2 Motivation waste of resources • Corporations, universities, and government have hundreds or thousands of desktop computers for its employees and students. • Desktops are idle 99% of the time – idle at night (6PM to 8 am) – idle during work hours – idle even when users are typing on the desktop keyboard • Dedicated clusters are idle most of the time generating heat and noise 3 Paradox 1. High demand for computatinal power 2. High level of idle resources • Third-world countries like Brazil cannot afford to waste resources like that. • Developed countries should also manage their resources better, at least for environmental reasons. InteGrade’s goal is to solve this paradox 4 Team Members o o o o Alfredo Goldman, Fabio Kon, Marcelo Finger e Siang W. Song (DCC – IME/USP) Markus Endler e Renato Cerqueira (DI – PUCRio) Edson Cáceres e Henrique Mongelli (DCT – UFMS) Approximately 10 graduate students 5 InteGrade: Description • Middleware to build a grid of commodity machines • Desktop users (Resource Providers) export their resources to the grid • Grid applications use only idle resources • Advantages over traditional dedicated clusters of commodity hardware 6 InteGrade: Key Features • Based on standard distributed object-oriented technology (CORBA) • Preserves resource provider’s QoS at all costs • Supports a wide range of parallel applications • Usage pattern collection and analysis 7 InteGrade: OO CORBA Middleware • Communication and architecture based on the CORBA industry standard – Object-orientation at all levels – Platform independent – Language independent • Leverages existing CORBA services (e.g. naming, trading, events, etc.) • Export functionality as CORBA services • If desired, can also operate with other communication models – Sockets, MPI, BSP, etc. 8 Feature: Preserves Resource Provider’s QoS • User-level scheduler (DSRT) limits resource consumption of Grid applications • Lightweight CORBA ORB (O2) • Configurable Resource Sharing (Optional) 9 Feature: Usage Pattern Collection and Analysis • Enhances scheduling by offering an approximate view of resource utilization • Usage Data is collected in short intervals (e.g. 5 min.) and analysed • Data is grouped in larger intervals called periods • Clustering algorithms applied to data will derive behavioral categories (e.g. night, lunch-break, week-days, etc) • Each machine learns about the utilization of its resources and uses knowledge of past to predict the future 10 Feature: Support for a Wide Range of Parallel Applications • Often unsupported by other grid initiatives, especially ones that make opportunistic use of shared resources – In most Grid systems parallel applications must have little or no communication among application nodes • InteGrade research focuses on other kinds of parallel application (with communication) • Information about links interconnecting nodes must be collected and utilized for scheduling 11 Feature: Ensures Application Progress • Usage pattern collection and analysis provides hints, minimizing interruptions • Checkpointing for sequential applications – Must be implemented on a machine and OS independent way • Progress of parallel applications is more difficult to ensure, requiring global consistent checkpoints • Possible solution: use BSP as parallel application model 12 Architecture: IntraCluster LRM - Local Resource Manager GRM - Global Resource Manager 13 Architecture: IntraCluster LUPA - Local Usage Pattern Analyzer GUPA - Global Usage Pattern Analyzer 14 Architecture: IntraCluster NCC - Node Control Center ASCT - Application Submission and Control Tool 15 Architecture: InterCluster 16 Related Work • Our work is influenced by 5 systems: – Globus, Legion, Condor, SETI@home, and 2K • Condor (U. of Wisconsin-Madison) – Pioneer (started on late 80s) – A “hunter” of idle workstations on local networks – Condor-G interfaces with Globus for integration with wide-area grids – Support for parallel applications is limited – We could not get its source-code • Globus (Argonne National Labs / U. of Chicago / USC) – Does not focus on QoS-preserving utilization of desktop machines – Not object-oriented – InteGrade uses CORBA and OO design 17 Related Work (continued) • Legion (U. of Virginia) – Proprietary distributed object model – InteGrade has deeper focus on idle resource management and desktop machines • SETI@Home (U. of California Berkeley) – Hard-coded application – No communication between application nodes • BOINC (U. of California Berkeley) – Limited support for parallel applications 18 Related Work (continued) • 2K (U. of Illinois at Urbana-Champaign) – a CORBA-based distributed operating system – does not focus on grid computing or parallel applications – provided a proof-of-concept prototype for some of the protocols we are using in InteGrade 19 Implementation Status • Already Implemented: – Intra-Cluster Information Update Protocol – Intra-Cluster Execution Protocol • Sequential applications • Parametric applications • Software used: – GRM: Java using JacORB – LRM: C++ using O2 20 Implementation Status: ClusterView 21 Ongoing sub-projects • Refinements and extensions to architecture and core software infrastructure • Initial support for parallel applications • Network discovery and monitoring • User usage pattern collection and analysis • Global, wide-area scheduling • Migration and mobile agents • lightweight middleware • autonomic computing – self-awareness, self-healing, self-adaptation • Security and privacy 22 Project Information • www.ime.usp.br/integrade • Source code available at FAPESP’s incubadora (anonymous CVS checkout & Web front end) • Increasing number of students working on the project • Initial beta version expected for the end of 2003 (alpha version already up and running) 23