Legion: The Grid OS Architecture and User View Anand Natrajan (anand@virginia.edu) Marty Humphrey (humphrey@cs.virginia.edu) The Legion Project, University of Virginia (http://legion.virginia.edu) Grid Environment Disjoint file systems Computers Disjoint namespaces Networks Multiple People administration domains Data Unpredictable load, availability, Devices failures Security problems Grid OS Requirements • Wide-area • High Performance • Complexity Management • Extensibility • Security • Site Autonomy • Input / Output • Heterogeneity • • • • • • Fault-tolerance Scalability Simplicity Single Namespace Resource Management Platform Independence • Multi-language • Legacy Support Legion - A Grid OS Tools • MPI / PVM • P-space studies multi-run • Parallel C++ • Parallel object-based Fortran • CORBA binding • Object migration • Accounting • Remote builds and compilations • Fault-tolerant MPI libraries • Post-mortem debugger • Console objects • Parallel 2D file objects • Collections • Licence support Commercial Support - Avaki Corp. Mentat Legion Avaki Web • • • • Venture funded Headquartered in Boston Growing number of employees Multi-tiered support offering Protein Folding with CHARMM Molecular Dynamics Simulations 100-200 structures to sample (r,Rgyr ) space r Rgyr Resources Available HP V-class CalTech 440 MHz PA-8700 128/128 IBM SP3 UMich 375MHz Power3 24/24 DEC Alpha UVa 533MHz EV56 32/128 IBM Blue Horizon SDSC 375MHz Power3 512/1184 Sun HPC 10000 SDSC 400MHz SMP 32/64 IBM Azure UTexas 160MHz Power2 32/64 Transparent Remote Execution • • • • • • • User initiates “run” User/Legion selects site Legion copies binaries Legion copies input files Legion starts job(s) Legion monitors progress Legion copies output files Mechanics of CHARMM Runs Create task Dispatch Dispatch directories & runs more runs specification Legion Register binaries 2% 1% 20% 0% 0% 77% Blue Horizon CalTech UTexas DEC Alpha UMich Sun HPC Types Of Applications • Legacy applications • Legion-aware applications – I/O library – 2D file object • Applications Using Stdgrid • Parameter Space Studies • Parallel Programs – MPI, PVM, MPL, Basic Fortran Support (BFS) Grid Application Requirements • • • • • Security Fault-tolerance Heterogeneity Collaboration … • Legion supports these and other needs Heterogeneous Runs BT-Med Ocean Model Cross-Organisation Collaboration • • • • Different companies Proprietary simulations and data Each needs the other Form virtual partnership Platforms • • • • • • • • Windows NT, 2K, 98, 95 Sun (Solaris) SGI (Irix, Origin) Intel (Linux, Free BSD) DEC (Unix, Linux) Cray (T90, T3E) IBM (AIX, SP-2) HP (HPUX) • • • • • • Codine LoadLeveler Maui PBS NQS LSF Applications • • • • • • • • Biochemistry and Molecular Science Information Retrieval NPACI - SDSC, UCSD, Caltech, Materials Science UTexas, Umich, UCB, UVa. DoD MSRCs - NAVO & ARL, NASA Ames Climate Modelling Neuroscience Aerospace Astronomy Graphics User View Command-Line Interface Setup • Setup shell environment variables . ~legion/setup.sh OR export LEGION=/home/legion/Legion export LEGION_OPR=/home/maya/OPR . $LEGION/bin/legion_env.sh • Specifies where binaries and configuration files can be found • Sets root context Login • Authentication to system legion_login /users/stephen • Currently uses password - other mechanisms, e.g., Kerberos ticket possible • Login object (a.k.a. Authentication object) - /users/stephen - is user’s proxy to world • Login object generates certificate identifying user Context Space / • Unix-like legion_ls legion_pwd legion_cd legion_cat ... hosts mach1 mach2 subdir prog home mydir file1 users you me tty Context Space • Network-wide, transparent file system • Location-independent read/write of files • Convenient transfer of files between context space and local file system • I/O libraries for access • Unix-like utilities Context Example legion_ls / Another Context legion_ls /hosts Yet Another Context legion_ls /users More Context Fun Other Context Commands • Locate a LOID in context space legion_list_names • Locate an object on a machine legion_whereis • Find status of an object legion_object_info • List metadata of an object legion_list_attributes Status Of An Object legion_object_info -c work Physical Location Of Object legion_whereis -c work Context Space vs. Local Space • Local space = your machine’s directory structure – OS-specific, Machine-specific – Use cp, copy, etc. – e.g., C:\Program Files\, /usr/bin, /mnt/disk1 • Context space = Legion’s directory structure – OS-independent, Machine-independent – Use legion_cp, etc. Context Space and Local Space • Transfer one file from local space to context space legion_cp -localsrc <localfile> <contextfile> • Transfer one file from context space to local space legion_cp -localdest <contextfile> <localfile> Context Space and Local Space • Copying local directory to context space legion_cp -r -localsrc <localdir> <contextdir> OR legion_import_tree <localdir> <contextdir> • Copying context directory to local space legion_cp -r -localdest <contextdir> <localdir> Context Space and Local Space • Map (not copy!) local directory to context space temporarily legion_export_dir <localdir> <contextdir> • Does NOT make copy of local directory • Merely provides Legion-like access to local directory – Use legion_cat on local files Making Context Space… • Local sub-directory with Legion NFS daemon – Use cat on context files • FTP directory with FTP interface • Windows directory with Samba interface • URL tree with HTTP interface I/O Performance Large Read Aggregate Bandwidth NFS lnfsd LegionFS 200 180 Bandwidth (MB/sec) 160 140 120 100 80 60 40 20 0 1 10 20 30 40 50 Number of readers – – – – X-Axis = number of clients simultaneously performing 1MB reads on 10MB files Y-Axis = total read bandwidth Each point = average of multiple runs Clients = 400MHz Intels, NFS Server = 800MHz Intel Making Context Space… • Local sub-directory with Legion NFS daemon – Use cat on context files • FTP directory with FTP interface • Windows directory with Samba interface • URL tree with HTTP interface Flexible Context Space e Disk Directory Samba NFS ftp HTTP FTP Context legion_import_tree Disk Context Context Context legion_export_dir Directory Directory Directory Access Control • MayI for each object implements access control on a per-function basis • Users named by login object • Sets of users grouped by contexts legion_change_permissions [+-rwx] [-v] <group/user context> <target context> legion_change_permissions +r /users/fred /home/grimshaw/myfile Access Control Example Unified Console TTY File User shares tty LOID User shares tty LOID Program produces stdout, stderr User creates tty object Prog. User starts Legion passes tty running LOID toprogram program TTY Object • Redirect run-time output to central (or multiple) consoles • Connect and disconnect dynamically • Debug quickly and simply • Monitor status, errors, easily • Share console with others legion_tty <ttyobj> User View Web Interface Logging In Listing Contents Of A Context Control Window Status Window StdOut Window StdErr Window Listing Classes (Contents of /class) Listing Hosts (Contents of /hosts) List Attributes Of An Object Start A Run Check The Status Of A Job Start An Amber (BioGrid) Run Check The Status Of An Amber Run Check The Status Of An Amber Run Graphically Check An Amber Run Interact With Amber Run Interact With Amber Run Interact With Amber Run Start A Hawley-Hydro Run Check The Status Of A Hydro Run Check The Status Of A Hydro Run Graphically Check A Hydro Run Graphically Check A Hydro Run Graphically Check A Hydro Run Graphically Check A Hydro Run Run RenderGrid Jobs (P-Space Jobs) Run RenderGrid Jobs (P-Space Jobs) Check The Status of A RenderGrid Job Check Accounting Logs User View Windows Interface Windows Browser Context Space in Windows Ability to export local directories into Legion’s context space Easy-to-use interface Ability of users to control when shared directories are visible to other users Access Control Ability of users to specify access control policies Fine-grained nature of policies Allow/Deny read access to users or groups Allow/Deny write access to users or groups Ease with which access rights can be changed Speed at which access rights are propagated through Legion space Windows Legion FTP Daemon Windows Job Sandbox Windows Process Control National Legion Net Summary • Philosophy – Grid as a Single Virtual Machine – Provide mechanisms; let others build policies • Architecture – Object-based, integrated – Default policies for scheduling, security, … • User Interfaces – Command-line, Web, Windows, FTP, HTTP, … Future Directions • • • • • • • Improved user interfaces More robust system Research activities - University of Virginia Commercial activities - Avaki Corporation Legion-G? Continued participation @GGFs Continued support for nationwide grid, grid applications